Sciweavers

TCS
2010
12 years 11 months ago
Definable transductions and weighted logics for texts
A text is a word together with an additional linear order on it. We study quantitative models for texts, i.e. text series which assign to texts elements of a semiring. We introduc...
Christian Mathissen
EMNLP
2009
13 years 2 months ago
Learning Term-weighting Functions for Similarity Measures
Measuring the similarity between two texts is a fundamental problem in many NLP and IR applications. Among the existing approaches, the cosine measure of the term vectors represen...
Wen-tau Yih
EMNLP
2010
13 years 2 months ago
Enhancing Domain Portability of Chinese Segmentation Model Using Chi-Square Statistics and Bootstrapping
Almost all Chinese language processing tasks involve word segmentation of the language input as their first steps, thus robust and reliable segmentation techniques are always requ...
Baobao Chang, Dongxu Han
MT
2007
158views more  MT 2007»
13 years 4 months ago
Automatic extraction of translations from web-based bilingual materials
This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps and compares webbased translation texts of Statistics Can...
Qibo Zhu, Diana Zaiu Inkpen, Ash Asudeh
JQL
2007
82views more  JQL 2007»
13 years 4 months ago
Experiments on authorship attribution by intertextual distance in English
How can it be said that texts are "near" or "distant" from one another? Are different texts by a single author more similar than texts by different authors? To...
Dominique Labbé
IPM
2008
196views more  IPM 2008»
13 years 4 months ago
Author identification: Using text sampling to handle the class imbalance problem
Authorship analysis of electronic texts assists digital forensics and anti-terror investigation. Author identification can be seen as a single-label multi-class text categorizatio...
Efstathios Stamatatos
NAACL
1994
13 years 5 months ago
Principles of Template Design
The functionality of systems that extract information from texts can be specified quite simply: the input is a stream of texts and the output is some representation of the informa...
Jerry R. Hobbs, David J. Israel
COLING
1994
13 years 5 months ago
A Part-of-Speech-Based Alignment Algorithm
To align bilingual texts becomes a crucial issue recently. Rather than using length-based or translation-based criterion, a part-of-speech-based criterion is proposed. We postulat...
Kuang-Hua Chen, Hsin-Hsi Chen
ECIR
2003
Springer
13 years 5 months ago
Corpus-Based Thesaurus Construction for Image Retrieval in Specialist Domains
This paper explores the use of texts that are related to an image collection, also known as collateral texts, for building thesauri in specialist domains to aid in image retrieval....
Khurshid Ahmad, Mariam Tariq, Bogdan Vrusias, Chri...
AAAI
2008
13 years 6 months ago
An Effective and Robust Method for Short Text Classification
Classification of texts potentially containing a complex and specific terminology requires the use of learning methods that do not rely on extensive feature engineering. In this w...
Victoria Bobicev, Marina Sokolova