Abstract—Recent progress in research fields such as Information Extraction and Information Retrieval enables the creation of systems providing better search experiences to web u...
Gianluca Demartini, Claudiu S. Firan, Mihai George...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. ...
Ever-growing amounts of data that must be distributed from data providers to consumers across the world necessitate a greater understanding of the software architectural implicati...
Abstract. In this paper we introduce BioPubMiner, a machine learning component-based platform for biomedical information analysis. BioPubMiner employs natural language processing t...