Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents. The first step in the Information Extract...
Kostyantyn M. Shchekotykhin, Dietmar Jannach, Gerh...
Abstract. Crawling the deep web often requires the selection of an appropriate set of queries so that they can cover most of the documents in the data source with low cost. This ca...
Anchor text has been considered as a useful resource to complement the representation of target pages and is broadly used in web search. However, previous research only uses anchor...
Research shows that comment spamming (comments which are unsolicited, unrelated, abusive, hateful, commercial advertisements etc) in online discussion forums has become a common p...
We propose a general method to watermark and probabilistically identify the structured outputs of machine learning algorithms. Our method is robust to local editing operations and...
Ashish Venugopal, Jakob Uszkoreit, David Talbot, F...