Sciweavers

DSS
2008

A machine learning approach to web page filtering using content and structure analysis

13 years 4 months ago
A machine learning approach to web page filtering using content and structure analysis
As the Web continues to grow, it has become increasingly difficult to search for relevant information using traditional search engines. Topic-specific search engines provide an alternative way to support efficient information retrieval on the Web by providing more precise and customized searching in various domains. However, developers of topic-specific search engines need to address two issues: how to locate relevant documents (URLs) on the Web and how to filter out irrelevant documents from a set of documents collected from the Web. This paper reports our research in addressing the second issue. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. We represent each Web page by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. Two experime...
Michael Chau, Hsinchun Chen
Added 10 Dec 2010
Updated 10 Dec 2010
Type Journal
Year 2008
Where DSS
Authors Michael Chau, Hsinchun Chen
Comments (0)