BayesWipe Database Cleaner

Real data with synthetic noise

[Download]. There are two files in this download: a file called ground_truth.txt which has the original values. The file "fulldb.txt" has the same data with 3% noise added to it. Bayeswipe can process that file and then the results should be compared to the ground truth.

Schema: model, make, type, year, condition, doors, engine

Real data with naturally occurring noise.

[Download].

A 2.5MB sample sanitized (with personal information like VIN removed) data from cars.com is provided here. Data like this was used to perform the Amazon Turk experiments. You may run BayesWipe on this in order to see how it operates on real world noise. Note that there is no 'ground truth' file available (creating one would have required the author to go through the whole data tuple by tuple).

Schema: make, model, type, fuel_type, engine, transmission, drivetrain, doors, wheelbase