This study borrowed sequence analysis techniques from the genetic sciences and applied them to a similar problem in email filtering and web searching. Genre identification is the ...
Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis--examples include social networks, Web gra...
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
As the World Wide Web in China grows rapidly, mining knowledge in Chinese Web pages becomes more and more important. Mining Web information usually relies on the machine learning ...
This paper explores topic aspect (i.e., subtopic or facet) classification for English and Chinese collections. The evaluation model assumes a bilingual user who has found document...