When we describe a Web page informally, we often use phrases like it looks like a newspaper site", there are several unordered lists" or it's just a collection of li...
Isabel F. Cruz, Slava Borisov, Michael A. Marks, T...
The semi-structured information available in HTML and similar documents provide valuable information that can be used for information extraction applications. This information tog...
We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the stru...
We describe a compression technique for semistructured documents, called SCMPPM, which combines the Prediction by Partial Matching technique with Structural Contexts Model (SCM) t...
This paper proposes a clustering approach that explores both the content and the structure of XML documents for determining similarity among them. Assuming that the content and th...