– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
The major limitation in bilingual latent semantic analysis (bLSA) is the requirement of parallel training corpora. Motivated by semi-supervised learning, we propose a clusterbased...
Set expansion refers to expanding a given partial set of objects into a more complete set. A well-known example system that does set expansion using the web is Google Sets. In thi...
Scientists typically need to take a large volume of information into account in order to deal with re-occurring tasks such as inspecting proceedings, finding related work, or revi...
Algirdas Avizienis, Gintare Grigonyte, Johann Hall...
The eXtensible Markup Language (XML) was originally defined to represent Web content, but it is increasingly used to define languages, such as XPL, that are used for coding execut...