Web-scale information extraction in knowitall: (preliminary results)

14 years 5 months ago

Download turing.cs.washington.edu

Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITALL, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner. The paper describes preliminary experiments in which an instance of KNOWITALL, running for four days on a single machine, was able to automatically extract 54,753 facts. KNOWITALL associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KNOWITALL's architecture and reports on lessons learned for the design of large-scale information extraction systems. Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natur...

Oren Etzioni, Michael J. Cafarella, Doug Downey, S

Real-time Traffic

Internet Technology | Keywords Information Extraction | Large-scale Information Extraction | Search Engines | WWW 2004 |

claim paper

» Web Scale Competitor Discovery Using Mutual Information

» Extracting Motor Unit Firing Information by Independent Component Analysis of Surface Elec...

» WizIE A Best Practices Guided Development Environment for Information Extraction

» Coupling information retrieval and information extraction A new text technology for gather...

» Network SnakesSupported Extraction of Field Boundaries from Imagery

» Metadata Propagation in the Web Using CoCitations

» Coupled Hierarchical IR and Stochastic Models for Surface Information Extraction

» Semantic Case Role Detection for Information Extraction

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2004
Where	WWW
Authors	Oren Etzioni, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates

Comments (0)

Sciweavers

Web-scale information extraction in knowitall: (preliminary results)

Internet Technology | Keywords Information Extraction | Large-scale Information Extraction | Search Engines | WWW 2004 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers