This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting ‘inform...
Conghui Zhu, Jie Tang, Hang Li, Hwee Tou Ng, Tieju...
Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of spam emails during th...
Fundamental to data cleaning is the need to account for multiple data representations. We propose a formal framework that can be used to reason about and manipulate data represent...
Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and e...