Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in...
This paper presents a new algorithm for feature generation, which is approximately derived based on geometrical interpretation of the Fisher linear discriminant analysis. In a fiel...
In the web context, it is difficult to disentangle presentation from process logic, and sometimes even data is not separate from the presentation. Consequently, it becomes to de...
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected ba...
The creation of a complex web site is a thorny problem in user interface design. First, di erent visitors have distinct goals. Second, even a single visitor may have di erent need...