This paper reports on the annotation of a corpus of 1 million words with four semantic annotation layers, including named entities, coreference relations, semantic roles and spati...
Spam deobfuscation is a processing to detect obfuscated words appeared in spam emails and to convert them back to the original words for correct recognition. Lexicon tree hidden M...
Semantic taxonomies such as WordNet provide a rich source of knowledge for natural language processing applications, but are expensive to build, maintain, and extend. Motivated by...
We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. Th...
Zach Solan, David Horn, Eytan Ruppin, Shimon Edelm...
Bit arrays, or bitmaps, are used to significantly speed up set operations in several areas, such as data warehousing, information retrieval, and data mining, to cite a few. Howeve...