Sciweavers

311 search results - page 7 / 63
» Cleaning Web Pages for Effective Web Content Mining
Sort
View
WSDM
2009
ACM
176views Data Mining» more  WSDM 2009»
15 years 4 months ago
The web changes everything: understanding the dynamics of web content
The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different...
Eytan Adar, Jaime Teevan, Susan T. Dumais, Jonatha...
COMAD
2008
14 years 11 months ago
CUM: An Efficient Framework for Mining Concept Units
Web is the most important repository of different kinds of media such as text, sound, video, images etc. Web mining is the process of applying data mining techniques to automatica...
Santhi Thilagam
ICADL
2005
Springer
112views Education» more  ICADL 2005»
15 years 2 months ago
A Method for Creating a High Quality Collection of Researchers' Homepages from the Web
This paper proposes a method for creating a high quality collection of researchers’ homepages. The proposed method consists of three phases: rough filtering of the possible web p...
Yuxin Wang, Keizo Oyama
WSDM
2010
ACM
215views Data Mining» more  WSDM 2010»
15 years 6 months ago
Boilerplate Detection using Shallow Text Features
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
Christian Kohlschütter, Peter Fankhauser, Wol...
AI
2005
Springer
14 years 11 months ago
Integrating Web Content Clustering into Web Log Association Rule Mining
Abstract. One of the effects of the general Internet growth is an immense number of user accesses to WWW resources. These accesses are recorded in the web server log files, which...
Jiayun Guo, Vlado Keselj, Qigang Gao