Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Given a large volume of Web documents, we consider problem of finding the shortest keyword sequences for each of the documents such that a keyword sequence can be rendered to a g...
Personal media collections are often viewed and managed along the social dimension, the places we spend time at and the people we see, thus tools for extracting and using this inf...
Recently, web mining that tries to find useful knowledge from the vast amount of web pages has attracted a lot of research interests. Besides, it is becoming an essential task to...
Term-based representations of documents have found widespread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard le...