Sciweavers

6 search results - page 1 / 2
» A pattern tree-based approach to learning URL normalization ...
Sort
View
WWW
2010
ACM
14 years 10 hour ago
A pattern tree-based approach to learning URL normalization rules
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
13 years 12 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 5 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
KDD
1997
ACM
154views Data Mining» more  KDD 1997»
13 years 8 months ago
Autonomous Discovery of Reliable Exception Rules
This paper presents an autonomous algorithm for discovering exception rules from data sets. An exception rule, which is defined as a deviational pattern to a well-known fact, exhi...
Einoshin Suzuki
ICTAI
2008
IEEE
13 years 11 months ago
Information Extraction as an Ontology Population Task and Its Application to Genic Interactions
Ontologies are a well-motivated formal representation to model knowledge needed to extract and encode data from text. Yet, their tight integration with Information Extraction (IE)...
Alain-Pierre Manine, Érick Alphonse, Philip...