Sciweavers

ICDE
2006
IEEE

Finding Thai Web Pages in Foreign Web Spaces

13 years 10 months ago
Finding Thai Web Pages in Foreign Web Spaces
While the Web has been increasingly recognized as a culturally valuable social artifact, many nations endeavor to create national Web archives for long term preservation. However, due to its borderless-ness, gathering information for a specific nation from the Web is challenging. This paper proposes language specific web crawling (LSWC) as a method of creating Web archives for countries with linguistic identities such as Thailand. The LSWC strategy for selectively gathering Thai web pages from virtually anywhere on the Web is derived based on static analyses of the Thai Web graph. Then, the LSWC strategy is evaluated on a crawling simulator with large dataset. Keyword ᖱႎᬌ⚝, ᕈ ƫ ܿ ଔ , Web ߣࠗ ࡦ ࠲ ࡯ ࡀ ࠶ ࠻ , Web ࠕ ࡯ ࠞ ࠗ ࡉ ,ࡈ ࠜ ࡯ ࠞ ࠬ ࠻ ࠢ ࡠ ࡯ ࡝ ࡦ ࠣ ,‫܂‬ ‫ݶ‬ ್ ቯ , Web ࠣ ࡜ ࡈ
Kulwadee Somboonviwat, Takayuki Tamura, Masaru Kit
Added 11 Jun 2010
Updated 11 Jun 2010
Type Conference
Year 2006
Where ICDE
Authors Kulwadee Somboonviwat, Takayuki Tamura, Masaru Kitsuregawa
Comments (0)