Sciweavers

336 search results - page 61 / 68
» Content-based language models for spoken document retrieval
Sort
View
CICLING
2010
Springer
15 years 3 months ago
Word Length n-Grams for Text Re-use Detection
Abstract. The automatic detection of shared content in written documents –which includes text reuse and its unacknowledged commitment, plagiarism– has become an important probl...
Alberto Barrón-Cedeño, Chiara Basile...
102
Voted
IJCNLP
2005
Springer
15 years 5 months ago
Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora
Abstract. We present a new implication of Wu’s (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retrieving truly parallel sentence translations from larg...
Dekai Wu, Pascale Fung
SIGIR
2009
ACM
15 years 6 months ago
Estimating query performance using class predictions
We investigate using topic prediction data, as a summary of document content, to compute measures of search result quality. Unlike existing quality measures such as query clarity ...
Kevyn Collins-Thompson, Paul N. Bennett
SIGIR
2009
ACM
15 years 6 months ago
Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization
This paper presents a transductive approach to learn ranking functions for extractive multi-document summarization. At the first stage, the proposed approach identifies topic th...
Massih-Reza Amini, Nicolas Usunier
99
Voted
WWW
2010
ACM
15 years 6 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han