Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...
While the discipline of computing has evolved significantly in the past 30 years, Computer Science curricula have not as readily adapted to these changes. In response, we have rec...
This paper presents a cluster validation based document clustering algorithm, which is capable of identifying both important feature words and true model order (cluster number). I...
The interpretation of natural scenes, generally so obvious and effortless for humans, still remains a challenge in computer vision. To allow the search of image-based documents i...
It has been shown that using phrases properly in the document retrieval leads to higher retrieval effectiveness. In this paper, we define four types of noun phrases and present an...
Wei Zhang, Shuang Liu, Clement T. Yu, Chaojing Sun...