Sciweavers

244 search results - page 10 / 49
» From HTML documents to web tables and rules
Sort
View
JCIT
2008
154views more  JCIT 2008»
14 years 11 months ago
Multimodal Web Content Conversion for Mobile Services in a U-City
A ubiquitous city is where everything is interconnected with everything else, where information is instantaneously shared. In a U-city, people can access a variety of web data in ...
Soosun Cho, HeeSook Shin
IJDAR
2006
102views more  IJDAR 2006»
14 years 11 months ago
Table form document analysis based on the document structure grammar
Structure analysis of table form documents is an important issue because a printed document and even an electronic document do not provide logical structural information but merely...
Akira Amano, Naoki Asada, Masayuki Mukunoki, Masah...
WWW
2005
ACM
16 years 13 days ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
90
Voted
FLAIRS
2007
15 years 2 months ago
Lexicon Development and POS Tagging Using a Tagged Bengali News Corpus
Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing(NLP) application areas. The rapid development of these resources...
Asif Ekbal, Sivaji Bandyopadhyay
WWW
2005
ACM
16 years 13 days ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger