Sciweavers

62 search results - page 2 / 13
» Learning Page-Independent Heuristics for Extracting Data fro...
Sort
View
ACL
2009
13 years 3 months ago
Mining Bilingual Data from the Web with Adaptively Learnt Patterns
Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...
BNCOD
2006
88views Database» more  BNCOD 2006»
13 years 6 months ago
The Lixto Project: Exploring New Frontiers of Web Data Extraction
The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction lan...
Julien Carme, Michal Ceresna, Oliver Frölich,...
IPM
2007
149views more  IPM 2007»
13 years 5 months ago
Web page title extraction and its application
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
Yewei Xue, Yunhua Hu, Guomao Xin, Ruihua Song, Shu...
DL
2000
Springer
351views Digital Library» more  DL 2000»
13 years 9 months ago
Acrophile: an automated acronym extractor and server
We implemented a web server for acronym and abbreviation lookup, containing a collection of acronyms and their expansions gathered from a large number of web pages by a heuristic ...
Leah S. Larkey, Paul Ogilvie, M. Andrew Price, Bre...
SIGIR
2005
ACM
13 years 11 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...