In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant inf...
In this demo, we present a system called iRIN designed for performing image retrieval in image-rich information networks. We first introduce MoK-SimRank to significantly improve...
Xin Jin, Jiebo Luo, Jie Yu, Gang Wang, Dhiraj Josh...
As the Internet grows and network bandwidth continues to increase, administrators are faced with the task of keeping confidential information from leaving their networks. Today’...
Many text documents on the Web are not originally created but forwarded or copied from other source documents. The phenomenon of document forwarding or transmission between variou...
This work aims to provide a page segmentation algorithm which uses both visual and content information to extract the semantic structure of a web page. The visual information is u...