Sciweavers

2827 search results - page 377 / 566
» Marking Text Documents
Sort
View
NIPS
2007
15 years 7 months ago
Supervised Topic Models
We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-like...
David M. Blei, Jon D. McAuliffe
ACL
2006
15 years 7 months ago
A DOM Tree Alignment Model for Mining Parallel Data from the Web
This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree align...
Lei Shi, Cheng Niu, Ming Zhou, Jianfeng Gao
ACL
2003
15 years 7 months ago
Parametric Models of Linguistic Count Data
It is well known that occurrence counts of words in documents are often modeled poorly by standard distributions like the binomial or Poisson. Observed counts vary more than simpl...
Martin Jansche
186
Voted
ACL
2003
15 years 7 months ago
Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web
In this paper, we present a method that automatically constructs a Named Entity (NE) tagged corpus from the web to be used for learning of Named Entity Recognition systems. We use...
Joohui An, Seungwoo Lee, Gary Geunbae Lee
ISTA
2003
15 years 7 months ago
Using neighborhood information for automated categorization of Web pages
: In this paper we discuss several issues related to the influence of expansion of a Web document representation on quality of topical categorization of Web pages. We consider a W...
Nadejda Panteleeva