We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While...
Soon, much of the data exchanged over the Internet will be encoded in XML, allowing for sophisticated filtering and content-based routing. We have built a filtering engine called ...
Yanlei Diao, Peter M. Fischer, Michael J. Franklin...
In dynamic environments with frequent content updates, we require online full-text search that scales to large data collections and achieves low search latency. Several recent met...
The blogosphere--the totality of blog-related Web sites-has become a great source of trend analysis in areas such as product survey, customer relationship, and marketing. Existing...
This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according...