Big data is the tar sands of the data world: vast reserves of raw gritty data whose valuable information content can only be extracted at great cost. MapReduce is a popular parall...
Recent efforts in the Conceptual Modelling community have been devoted to properly capturing time-varying information, and several proposals of temporally enhanced Entity-Relation...
— The concept of Cumulated Anomaly (CA), which describes a new type of database anomalies, is addressed. A typical CA intrusion is that when a user who is authorized to modify da...
Central to a data cleaning system are record matching and data repairing. Matching aims to identify tuples that refer to the same real-world object, and repairing is to make a dat...
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Weny...
As with many large organizations, the Government's data is split in many different ways and is collected at different times by different people. The resulting massive data he...