We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number...
Keyword indices, topic directories, and link-based rankings are used to search and structure the rapidly growing Web today. Surprisingly little use is made of years of browsing ex...
Often remote investigations use autonomous agents to observe an environment on behalf of absent scientists. Predictive exploration improves these systems’ efficiency with onboa...
Episodic knowledge is often stored in the form of textual narratives written in natural language. However, a large repository of such narratives will contain both repetitive and n...
Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data se...