This paper reports on an inquiry into the use of metadata, publishing formats, and markup in editormanaged open access journals. It builds on findings from a study of the document...
Due to the growing importance of the World Wide Web, archiving it has become crucial for preserving useful source of information. To maintain a web archive up-to-date, crawlers ha...
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing(NLP) application areas. The rapid development of these resources...
Due to the enormous size of the web and low precision of user queries, finding the right information from the web can be difficult if not impossible. One approach that tries to ...