Sciweavers

WWW
2008
ACM

Mining, indexing, and searching for textual chemical molecule information on the web

14 years 5 months ago
Mining, indexing, and searching for textual chemical molecule information on the web
Current search engines do not support user searches for chemical entities (chemical names and formulae) beyond simple keyword searches. Usually a chemical molecule can be represented in multiple textual ways. A simple keyword search would retrieve only the exact match and not the others. We show how to build a search engine that enables searches for chemical entities and demonstrate empirically that it improves the relevance of returned documents. Our search engine first extracts chemical entities from text, performs novel indexing suitable for chemical names and formulae, and supports different query models that a scientist may require. We propose a model of hierarchical conditional random fields for chemical formula tagging that considers long-term dependencies at the sentence level. To substring searches of chemical names, a search engine must index substrings of chemical names. Indexing all possible sub-sequences is not feasible in practice. We propose an algorithm for independent...
Bingjun Sun, Prasenjit Mitra, C. Lee Giles
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2008
Where WWW
Authors Bingjun Sun, Prasenjit Mitra, C. Lee Giles
Comments (0)