2014-10-21 : DataCleaner 3.7 - Connect, Check, Consolidate

This morning version 3.7 of DataCleaner hit the streets and it’s ready to hook up, eager to spread tender loving care to your databases and data files. The keywords of this release are "Connect, Check, Consolidate" since this has been the focus of our development: Connecting to a lot of data sources, checking the data for inconsistencies and consolidating data through migrations and deduplication.


We've added connectivity in DataCleaner to Apache HBase and JSON files. Apache HBase is a popular Hadoop database, a distributed, scalable, big data store. JSON is a data representation format that is becoming increasingly popular for Web technologies, web services and NoSQL databases.


The analytical capabilities of DataCleaner have also been improved. We’ve added an efficient Unique Key check feature. This allows you to easily and quickly check for duplicate keys (or other expected unique values) in your datasets.


Talking about duplicates, the Duplicate Detection feature of DataCleaner professional edition has been improved in many ways. We’ve made several improvements to the user interface, making more options available for the advanced users. We’ve also published an online video tutorial to get people started. On the technical side, the deduplication model is now represented in a more readable XML format and the algorithm for detecting initial duplicates for training has been improved.

Beyond these user-facing features, we've worked on several behind-the-scenes improvements. In fact, what we normally refer to as the "engine" of DataCleaner – AnalyzerBeans – was finally given the big "1.0" version tag to resemble the completeness of this core component.

All in all, it's a release that we hope you enjoy and that we are very happy about. Do go and get your free download and take it for a spin!