Automatic User Comment Detection in Flat Internet Fora

11 years 8 months ago
Automatic User Comment Detection in Flat Internet Fora
—Millions of people are using the World Wide Web and are publishing content online. This user generated content contains many information relevant not only to marketing but to companies in general (customer-oriented products), governments (direct democracy) and many more. Analysis on such data becomes more and more important. This paper deals with a prerequisite: we propose an algorithm to automatically detect posting structures in flat internet fora to extract user comments. The algorithm is able to handle a wide range of different fora systems — even nested structures. The approach first detects the main content section by applying a modified version of the SST algorithm and then detects the posting structure by using several posting properties found in internet fora. It creates XPath expressions for faster data extraction in further steps. Keywords-social media, internet community, forum, web 2.0, Information Retrieval, crawler, extraction
Mathias Bank, Michael Mattes
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Authors Mathias Bank, Michael Mattes
Comments (0)