Abstract. As XML diffusion keeps increasing, it is today common practice for most developers to deal with XML parsing and transformation. XML is used as format to e.g. render data,...
Existing template-independent web data extraction approaches adopt highly ineffective decoupled strategies--attempting to do data record detection and attribute labeling in two se...
Many documents on the Web are formated in a weakly structured format. Because of their weak semantic and because of the heterogeneity of their formats, the information conveyed by...
In this paper, we consider the problem of extracting structured data from web pages taking into account both the content of individual attributes as well as the structure of pages...
Demographic information plays an important role in gaining valuable insights about a web-site's user-base and is used extensively to target online advertisements and promotion...