In order to artificially boost the rank of commercial pages in search engine results, search engine optimizers pay for links to these pages on other websites. Identifying paid lin...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
This research is about automatic identification and extraction of person names in Chinese text documents. Solutions to this problem have immediate and extensive applications in ma...
Extracting geographical information from various web sources is likely to be important for a variety of applications. One such use for this information is to enable the study of v...
The Web has been rapidly "deepened" by myriad searchable databases online, where data are hidden behind query interfaces. As an essential task toward integrating these m...