Sciweavers

IPPS
2008
IEEE

Multi-threaded data mining of EDGAR CIKs (Central Index Keys) from ticker symbols

13 years 10 months ago
Multi-threaded data mining of EDGAR CIKs (Central Index Keys) from ticker symbols
This paper describes how use the Java Swing HTMLEditorKit to perform multi-threaded web data mining on the EDGAR system (Electronic DataGathering, Analysis, and Retrieval system). EDGAR is the SEC’s (U.S. Securities and Exchange Commission) means of automating the collection, validation, indexing, acceptance, and forwarding of submissions. Some entities are regulated by the SEC (e.g. publicly traded firms) and are required, by law, to file with the SEC. Our focus is on making use of EDGAR to get information about company filings. These offers are filed with companies, using their Central Index Key (CIK). The CIK is used on the SEC’s computer system to identify entities that filed a disclosure with the SEC. We show how to map a stock ticker symbol into a CIK. The methodology for converting the web data source into internal data structures is based on using HTML as the input into a context-sensitive parser-callback facility. Screen scraping is a popular means of data mining, but the...
Dougal A. Lyon
Added 31 May 2010
Updated 31 May 2010
Type Conference
Year 2008
Where IPPS
Authors Dougal A. Lyon
Comments (0)