Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...
The increasing centralization of networked services places user data at considerable risk. For example, many users store email on remote servers rather than on their local disk. D...
Adam J. Aviv, Michael E. Locasto, Shaya Potter, An...
Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such ...
Most human activities occur around where the user is physically located. Knowing the geographical serving area of web resources, therefore, is very important for many web applicat...
Qi Zhang, Xing Xie, Lee Wang, Lihua Yue, Wei-Ying ...
We address the problem of identifying the domain of online databases. More precisely, given a set F of Web forms automatically gathered by a focused crawler and an online database...