Disfluencies include editing terms such as uh and um as well as repeats and revisions. Little is known about how disfluencies are processed, and there has been next to no research...
Fernanda Ferreira, Ellen F. Lau, Karl G. D. Bailey
As more data (especially scientific data) is digitized and put on the Web, the importance of tracking and sharing its provenance metadata grows. Besides capturing the annotation pr...
Li Ding, Jie Bao, James Michaelis, Jun Zhao, Debor...
We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models...
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Popular entities often have thousands of instances on the Web. In this paper, we focus on the case where they are presented in table-like format, namely appearing with their attri...
Conglei Yao, Yongjian Yu, Sicong Shou, Xiaoming Li