Sciweavers

EDBT
2009
ACM

Type-based categorization of relational attributes

13 years 11 months ago
Type-based categorization of relational attributes
In this work we concentrate on categorization of relational attributes based on their data type. Assuming that attribute type/characteristics are unknown or unidentifiable, we analyze and compare a variety of type-based signatures for classifying the attributes based on the semantic type of the data contained therein (e.g., router identifiers, social security numbers, email addresses). The signatures can subsequently be used for other applications as well, like clustering and index optimization/compression. This application is useful in cases where very large data collections that are generated in a distributed, ungoverned fashion end up having unknown, incomplete, inconsistent or very complex schemata and schema level meta-data. We concentrate on heuristically generating type-based attribute signatures based on both local and global computation approaches. We show experimentally that by decomposing data into q-grams and then considering signatures based on q-gram distributions, we ...
Babak Ahmadi, Marios Hadjieleftheriou, Thomas Seid
Added 19 May 2010
Updated 19 May 2010
Type Conference
Year 2009
Where EDBT
Authors Babak Ahmadi, Marios Hadjieleftheriou, Thomas Seidl, Divesh Srivastava, Suresh Venkatasubramanian
Comments (0)