Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Information extraction approaches are heavily used to gather product information on the Web, especially focusing on technical product specifications. If requesting different sour...
cessary to abstract it and eliminate the redundancy data. In this context, a method for data reduction based on the formal concept analysis is proposed in [16,17]. At the same time...
We have performed a set of experiments made to investigate the utility of morphological analysis to improve retrieval of documents written in languages with relatively large morph...
In order to utilize geographic web information for digital city applications, we have been developing a geographic web search system, KyotoSEARCH. When users retrieve geographic in...
Ryong Lee, H. Shiina, Taro Tezuka, Yusuke Yokota, ...