Sciweavers

IALP
2009

Challenges in Developing Persian Corpora from Online Resources

13 years 2 months ago
Challenges in Developing Persian Corpora from Online Resources
Persian is one of the Indo-European languages which has borrowed its script from Arabic, a member of Semitic language family. Since Persian and Arabic scripts are so similar, problems arise when we want to process an electronic text. In this paper, some of the common problems faced experimentally in developing a corpus for Persian from on-line materials are discussed. The sources of the problems are the Persian script itself; mixture with the Arabic script; Persian orthography; the typists' typing styles; and mixing Persian code pages with Arabic code pages in operating systems.
Masood Ghayoomi, Saeedeh Momtazi
Added 18 Feb 2011
Updated 18 Feb 2011
Type Journal
Year 2009
Where IALP
Authors Masood Ghayoomi, Saeedeh Momtazi
Comments (0)