The Electronic Data Gathering, Analysis, and Retrieval system (EDGAR) is an online repository, first constructed in 1993, which is maintained by the US Securities and Exchange Commission (SEC), and is designed to automate the collection, validation and acceptance of submissions and announcements by companies that are required to do so by law. The edgar schema is a collection of information scraped from this repository. The tables contained in this database schema consist of a number of main tables, containing the most fundamental information contained in EDGAR, as well as a number of dependent tables which are constructed most generally by scraping and cleaning the information in the filings and documents linked from the main tables, such as the set of tables listing the information contained in the Form 3, Form 4, and Form 5 filings. This readme file discusses the main tables. By far the most important tables are filings
and filing_docs
, which contain the basic information on each filing and their linking documents respectively, which one can use to deduce the url links to get to them, or to find the location of the documents in memory if they have been downloaded.
As mentioned above, these are main tables indexing all the filings in the SEC EDGAR database since 1993, and their associated documents respectively. Here, for each table we give a list of the associated fields.
-
filings
: This is an index of all filings in the SEC EDGAR database (since 1993). The program which constructs this table is contained inget_filings.R
. The fields arecompany_name
: the name of the company/firm which made the filing.form_type
: the form type of the filing. A description of the existing form types can be found here.cik
: the Central Index Key (CIK) of the company/firm which made the filing. This index is very important, as each company/firm is assigned a unique CIK number, and it can thus be used to efficiently search for the filings of a particular company/firm on EDGAR.date_filed
: the date the filing was made with the SEC.file_name
: This is the file name of the filing. This is an important field, as the file name uniquely assigned for each filing, and so can therefore identify each filing uniquely. This makes it important for joining with dependant tables, as they are often indexed, either partially or fully, by this file name.
-
filing_docs
: this is a table listing the documents associated with each filing that appears infilings
. The code which maintains this table is contained inget_filing_docs.R
. See here for more information. The fields areseq
: The sequence number, or ordinal number, of the document as it appears on the html page of the filing.description
: this is a description of what the document is.document
: this is a file name for the particular document. This, along withfile_name
, is also important for joining with dependant tables. The tuple of (file_name
,document
) uniquely identifies a particular document for a particular filing, and records from dependant tables are thus usually indexed by these fields (eg. the tables containing the data from Forms 3, 4 and 5).type
: a word describing what type of document the given document is.size
: the size of the file in bytes, as stated in the page of the filingfile_name
: This is the same as infilings
-
accession_numbers
: Each filing has an accession number, contained in thefile_name
of the filing infilings
. Each accession number contains three numbers: the first number is the CIK, the second is the year in which the filing was made, and the third is a sequential count of the number of filings made under the CIK (usually for the year, see here for more detail). This table contains these. The code to make this table is inget_accession_nos.R
. The fields arefile_name
: Same as infilings
accessionnumber
: the accession number, represented as a string.
-
cusip_cik
: CUSIP numbers are unique identifying numbers for North American securities, assigned by the Committee on Uniform Security Identification Procedures (CUSIP), hence the name. This table provides a mapping from CUSIPs to CIKs. The mapping is formed by scraping the CIK and CUSIP numbers for each company listed in formsSC 13D
andSC 13G
, and associating them. The fields arefile_name
: Same as infilings
. In this case, this is the file name of the filing from which the data in the other fields was scraped.cusip
: the CUSIP number of the company.cik
: The CIK number of the company. This is not necessary the same CIK as that contained in thefile_name
.company_name
: The name of the company.format
: A letter (usually 'A', 'B' or 'D') which keeps track of how the CUSIP number is displayed in the original filing document.
-
filer_ciks
: This table contains data on filer CIKs fromSC 13D
andSC 13G
forms. The code which makes this table is contained inget_filer_ciks.R
. The fields arefile_name
: Same as infilings
, this case the file name of the particularSC 13D
orSC 13G
form.filer_cik
: the CIK number of the filer
-
item_no
: Table listing item numbers associated with each 8-K filing. The code to make and update this table is contained inget_item_nos.R
. The fields arefile_name
: Same as infilings
, this case the file name of the particular8-K
or8-K/A
form.item_no
: the item number associated with the filing. If a filings has more than one item number, there will be multiple rows for the filings associated with each item number. A description of these item numbers are given in the following table,item_no_desc
.
-
item_no_desc
: Table providing explanations for each item number used in 8-K filings. The data is extracted from here. The code to make the table is contained inget_item_no_desc.R
, which scrapes the descriptions for each of the unique item numbers stored initem_no
. The fields areitem_no
: the item number. Same as the numbers that appear in the corresponding field in the tableitem_no
.item_desc
: A description of the situation/context that the item number corresponds to.
server_log
: Details to come.
index_last_modified
: Table used in updatingfilings
.filing_docs_processed
: Table used in keeping track of documents infiling_docs
that have been downloaded using download_filing_docs.R.server_log_processed
: Table used in updatingserver_log
.
A script to update the Edgar database is found here.
get_server_logs.R
: Script to get server logs from EDGAR to populateserver_log
andserver_log_processed
.get_13D_filings.R
: Script to get the 13D filings. Always run this before runningextract_cusips.R
.get_13D_filing_details
: Extracts the details from the files contained inedgar.filings
extract_cusips.pl
: Contains Perl code which extracts the cusip_cik numbers. This file is used and sourced by extract_cusips.R, so one should run the latter to put the cusip_ciks in the database.extract_cusips.R
: Puts the cusip_cik numbers in the database.