Search Sciweavers | Sciweavers

62 search results - page 4 / 13

» Creating Permanent Test Collections of Web Pages for Informa...

click to vote

SIGIR
2008
ACM

176views Information Technology» more SIGIR 2008»

SpotSigs: robust and efficient near duplicate detection in large web collections

13 years 5 months ago

Download ilpubs.stanford.edu

Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...

Martin Theobald, Jonathan Siddharth, Andreas Paepc...

claim paper

Read More »

click to vote

WWW
2009
ACM

209views Internet Technology» more WWW 2009»

Incorporating site-level knowledge to extract structured data from web forums

14 years 6 months ago

Download www2009.eprints.org

Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...

Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...

claim paper

Read More »

click to vote

ICWSM
2009

201views Internet Technology» more ICWSM 2009»

MakeMyPage: Social Media Meets Automatic Content Generation

13 years 3 months ago

Download infolab.northwestern.edu

Finding out about a topic online can be time consuming. It involves visiting multiple news sites, encyclopedia entries, video repositories and other resources while discarding irr...

Francisco Iacobelli, Kristian J. Hammond, Larry Bi...

claim paper

Read More »

click to vote

JCDL
2004
ACM

128views Education» more JCDL 2004»

Panorama: extending digital libraries with topical crawlers

13 years 10 months ago

Download clgiles.ist.psu.edu

A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of inform...

Gautam Pant, Kostas Tsioutsiouliklis, Judy Johnson...

claim paper

Read More »

click to vote

CHI
1996
ACM

124views Human Computer Interaction» more CHI 1996»

Silk from a Sow's Ear: Extracting Usable Structures from the Web

13 years 9 months ago

Download www2.parc.com

In its current implementation, the World-Wide Web lacks much of the explicit structure and strong typing found in many closed hypertext systems. While this property has directly f...

Peter Pirolli, James E. Pitkow, Ramana Rao

claim paper

Read More »

« Prev « First page 4 / 13 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers