Data about everything is readily available on the web—but often only accessible through elaborate user interactions. For automated decision support, extracting that data is esse...
Andrew Jon Sellers, Tim Furche, Georg Gottlob, Gio...
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
Background: Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produc...
Hong Sun, Hakan Ferhatosmanoglu, Motonori Ota, Yus...
The World-Wide Web provides remote access to pages using its own naming scheme (URLs), transfer protocol (HTTP), and cache algorithms. Not only does using these special-purpose me...
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...