Sciweavers

489 search results - page 64 / 98
» Effective techniques for automatic extraction of Web publica...
Sort
View
DEXAW
2008
IEEE
136views Database» more  DEXAW 2008»
15 years 3 months ago
Segmentation of Legislative Documents Using a Domain-Specific Lexicon
The amount of legal information is continuously growing. New legislative documents appear everyday in the Web. Legal documents are produced on a daily basis in briefingformat, cont...
Ismael Hasan, Javier Parapar, Roi Blanco
WWW
2003
ACM
16 years 2 months ago
Efficient URL caching for world wide web crawling
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
Andrei Z. Broder, Marc Najork, Janet L. Wiener
WWW
2009
ACM
15 years 8 months ago
Near real time information mining in multilingual news
This paper presents a near real-time multilingual news monitoring and analysis system that forms the backbone of our research work. The system integrates technologies to address t...
Martin Atkinson, Erik Van der Goot
EMNLP
2004
15 years 3 months ago
Monolingual Machine Translation for Paraphrase Generation
We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pair...
Chris Quirk, Chris Brockett, William B. Dolan
NLDB
2004
Springer
15 years 7 months ago
Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation
Abstract. Extracting information automatically from texts for database representation requires previously well-grouped phrases so that entities can be separated adequately. This pr...
Hiram Calvo, Alexander F. Gelbukh