Sciweavers

145 search results - page 25 / 29
» Web Contents Tracking by Learning of Page Grammars
Sort
View
SGAI
2004
Springer
15 years 2 months ago
Neighbourhood Exploitation in Hypertext Categorization
As the web expands exponentially, the need to put some order to its content becomes apparent. Hypertext categorization, that is the automatic classification of web documents into ...
Houda Benbrahim, Max Bramer
CORR
2008
Springer
176views Education» more  CORR 2008»
14 years 9 months ago
Analysis of Social Voting Patterns on Digg
The social Web is transforming the way information is created and distributed. Authoring tools, e.g., blog publishing services, enable users to quickly and easily publish content,...
Kristina Lerman, Aram Galstyan
AGENTS
1997
Springer
15 years 1 months ago
A Scalable Comparison-Shopping Agent for the World-Wide Web
The World-Wide-Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics...
Robert B. Doorenbos, Oren Etzioni, Daniel S. Weld
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
15 years 4 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
EMNLP
2008
14 years 11 months ago
HTM: A Topic Model for Hypertexts
Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recent...
Congkai Sun, Bin Gao, Zhenfu Cao, Hang Li