A major problem in today's information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system tha...
Igor Tatarinov, Zachary G. Ives, Jayant Madhavan, ...
Abstract-Unstructured text represents a large fraction of the world's data. It often contain snippets of structured information within them (e.g., people's names and zip ...
Daisy Zhe Wang, Eirinaios Michelakis, Joseph M. He...
Wireless Sensor Networks (WSNs) are increasingly being employed as a key building block of pervasive computing infrastructures, owing to their ability to be embedded within the re...
Animesh Pathak, Luca Mottola, Amol Bakshi, Viktor ...
Today it is possible to deploy sensor networks in the real world and collect large amounts of raw sensory data. However, it remains a major challenge to make sense of sensor data, ...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...