Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
GPS-equipped taxis can be viewed as pervasive sensors and the large-scale digital traces produced allow us to reveal many hidden “facts” about the city dynamics and human beha...
Daqing Zhang, Nan Li, Zhi-Hua Zhou, Chao Chen, Lin...
Image annotation has been an active research topic in recent years due to its potentially large impact on both image understanding and Web image search. In this paper, we target a...
Xirong Li, Le Chen, Lei Zhang, Fuzong Lin, Wei-Yin...
Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...