Computer Science | Sciweavers

204

Voted

CIKM
2011
Springer

218views Information Technology» more CIKM 2011»

Probabilistic near-duplicate detection using simhash

14 years 7 months ago

This paper oﬀers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...

Sadhan Sood, Dmitri Loguinov

claim paper

Read More »

214

click to vote

CIKM
2011
Springer

218views Information Technology» more CIKM 2011»

Integrating and querying web databases and documents

14 years 7 months ago

Download www2.cs.uh.edu

There exist many interrelated information sources on the Internet that can be categorized into structured (database) and semistructured (documents). A key challenge is to integrat...

Carlos Garcia-Alvarado, Carlos Ordonez

claim paper

Read More »

207

Voted

CIKM
2011
Springer

186views Information Technology» more CIKM 2011»

Towards a framework for attribute retrieval

14 years 7 months ago

Download ftp.irit.fr

In this paper, we propose an attribute retrieval approach which extracts and ranks attributes from HTML tables. We distinguish between class attribute retrieval and instance attri...

Arlind Kopliku, Mohand Boughanem, Karen Pinel-Sauv...

claim paper

Read More »

224

click to vote

CIKM
2011
Springer

200views Information Technology» more CIKM 2011»

Semi-supervised multi-task learning of structured prediction models for web information extraction

14 years 7 months ago

Download www.keerthis.com

Extracting information from web pages is an important problem; it has several applications such as providing improved search results and construction of databases to serve user qu...

Paramveer S. Dhillon, Sundararajan Sellamanickam, ...

claim paper

Read More »

212

Voted

CIKM
2011
Springer

192views Information Technology» more CIKM 2011»

Toward interactive training and evaluation

14 years 7 months ago

Download www.cs.umass.edu

Machine learning often relies on costly labeled data, and this impedes its application to new classiﬁcation and information extraction problems. This has motivated the developme...

Gregory Druck, Andrew McCallum

claim paper

Read More »

214

Voted

CIKM
2011
Springer

200views Information Technology» more CIKM 2011»

PDFMeat: managing publications on the semantic desktop

14 years 7 months ago

Download dbs.uni-leipzig.de

Researchers maintain bibliographies and extensive sets of PDF ﬁles of scholarly publications on their desktop. The lack of proper metadata of downloaded PDFs makes this task a t...

David Aumüller, Erhard Rahm

claim paper

Read More »

221

Voted

CIKM
2011
Springer

193views Information Technology» more CIKM 2011»

Supervised language modeling for temporal resolution of texts

14 years 7 months ago

Download www.ischool.utexas.edu

We investigate temporal resolution of documents, such as determining the date of publication of a story based on its text. We describe and evaluate a model that build histograms e...

Abhimanu Kumar, Matthew Lease, Jason Baldridge

claim paper

Read More »

341

click to vote

CIKM
2011
Springer

259views Information Technology» more CIKM 2011»

Focusing on novelty: a crawling strategy to build diverse language models

14 years 7 months ago

Download www2.research.att.com

Word prediction performed by language models has an important role in many tasks as e.g. word sense disambiguation, speech recognition, hand-writing recognition, query spelling an...

Luciano Barbosa, Srinivas Bangalore

claim paper

Read More »

199

click to vote

CIKM
2011
Springer

234views Information Technology» more CIKM 2011»

Personalizing web search results by reading level

14 years 7 months ago

Download www.cs.nyu.edu

Traditionally, search engines have ignored the reading diﬃculty of documents and the reading proﬁciency of users in computing a document ranking. This is one reason why Web se...

Kevyn Collins-Thompson, Paul N. Bennett, Ryen W. W...

claim paper

Read More »

249

click to vote

CIKM
2011
Springer

245views Information Technology» more CIKM 2011»

Do all birds tweet the same?: characterizing twitter around the world

14 years 7 months ago

Download www.ruthygarcia.com

Social media services have spread throughout the world in just a few years. They have become not only a new source of information, but also new mechanisms for societies world-wide...

Barbara Poblete, Ruth Garcia, Marcelo Mendoza, Ale...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers