In plenty of scenarios, data can be represented as vectors mathematically abstracted as points in a Euclidean space. Because a great number of machine learning and data mining app...
Web spam is a widely-recognized threat to the quality and security of the Web. Web spam pages pollute search engine indexes, burden Web crawlers and Web mining services, and expos...
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...
Background: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise stati...
Inverted indexes using sequences of characters (n-grams) as terms provide an error-resilient and language-independent way to query for arbitrary substrings and perform approximate...