Sciweavers

SOCIALCOM
2010

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

13 years 2 months ago
Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph
A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search engines today. However, this representation involves a key oversimplification of the true complexity of the Web: an edge in the traditional Web graph represents only the existence of a hyperlink; information on the context (e.g., informational, adversarial, commercial, spam) behind the hyperlink is absent. In this work-in-progress paper, we describe an ongoing collaborative project between two teams, one specializing in network science and analysis and the other specializing in text analysis and machine learning, to address this oversimplification. Using techniques in natural language processing, text mining and machine learning to extract relevant features of hyperlinks and classify them into one of several types, this undertaking builds and analyzes a multi-relational web graph. A key aspect of this work i...
Harish Sethu, Alexander Yates
Added 15 Feb 2011
Updated 15 Feb 2011
Type Journal
Year 2010
Where SOCIALCOM
Authors Harish Sethu, Alexander Yates
Comments (0)