Public-use sensor datasets are a useful scientific resource with the unfortunate feature that their provenance is easily disconnected from their content. To address this we intro...
Stephen Chong, Christian Skalka, Jeffrey A. Vaugha...
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Constructing a Chinese digital library, especially for a historical article archiving, is often bothered by the small character sets supported by the current computer systems. Thi...
While information retrieval (IR) and databases (DB) have been developed independently, there have been emerging requirements that both data management and efficient text retrieva...
Jinsuk Kim, Du-Seok Jin, Yunsoo Choi, Chang-Hoo Je...
This paper describes a new finite-state shallow parser. It merges constructive and reductionist approaches within a highly modular architecture. Syntactic information is added at ...