Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This ...
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected ba...
The rise of email and instant messaging as important tools in the professional workplace has created changes in how we communicate. One such change is that these media tend to red...
Abstract. A growing amounts of information are currently being generated and stored in the World Wide Web (WWW), in particular, researchers in any field can find a lot of publicati...
In this paper, we propose a novel Chinese word segmentation method which leverages the huge deposit of Web documents and search technology. It simultaneously solves ambiguous phra...