This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
The World Wide Web provides an increasingly powerful and popular publication mechanism. Web documents often contain a large number of images serving various different purposes. Th...
The rapid growth of the World Wide Web and the Internet has fueled interest in Web services and the Semantic Web, which are quickly becoming important parts of modern electronic c...
The rapid development of network technologies has made the web a huge information source with its own characteristics. In most cases, traditional database-based technologies are no...
Started in 1998, the search engine Google estimates page importance using several parameters. PageRank is one of those. Precisely, PageRank is a distribution of probability on the ...