Sciweavers

WWW
2008
ACM

Representing a web page as sets of named entities of multiple types: a model and some preliminary applications

14 years 5 months ago
Representing a web page as sets of named entities of multiple types: a model and some preliminary applications
As opposed to representing a document as a "bag of words" in most information retrieval applications, we propose a model of representing a web page as sets of named entities of multiple types. Specifically, four types of named entities are extracted, namely person, geographic location, organization, and time. Moreover, the relations among these entities are also extracted, weighted, classified and marked by labels. On top of this model, some interesting applications are demonstrated. In particular, we introduce a notion of person-activity, which contains four different elements: person, location, time and activity. With this notion and based on a reasonably large set of web pages, we are able to show how one person's activities can be attributed by time and location, which gives a good idea of the mobility of the person under question. Categories and Subject Descriptors
Nan Di, Conglei Yao, Mengcheng Duan, Jonathan J. H
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2008
Where WWW
Authors Nan Di, Conglei Yao, Mengcheng Duan, Jonathan J. H. Zhu, Xiaoming Li
Comments (0)