2015-07-22 : DataCleaner 4.0.9 released

DataCleaner 4.0.9 has been released today and with this release we're bringing a couple of improvements and bugfixes that we hope everybody will enjoy! Let's dive right into the news:

Improvements and new features:
  • We've made it possible to create and drop tables via the desktop UI of DataCleaner. Note that the term "table" here actually covers more than just relational database tables. It also includes Sheets in MS Excel datastores, Collections in MongoDB, Document types in CouchDB and ElasticSearch and so on... Basically all datastore types that support write-operations, except single-table datastores such as CSV datastores, support this functionality! The functionality is exposed via:
    • "Create table" enabled via the right-click menu of schemas in the tree on the left side of the application.
    • "Create table" enabled also via table-selection inputs in components such as Insert into table, Table lookup and Update table.
    • "Drop table" enabled via the right-click menu of tables in the tree on the left side of the application.
  • We've added the (optional) capability of specifying your Salesforce.com web service Endpoint URL. This allows you to use DataCleaner to connect to sandbox environments of Salesforce.com as well to your own custom endpoints.
  • The ElasticSearch support has been improved, allowing custom mappings as well as reusing the ElasticSearch datastore definitions now also for searching and indexing.
  • The sampling of records and selection of potential duplicates in the Duplicate detection function has been improved, leading to faster configuration because the decisions made during the training session are more representative.
  • The Duplicate detection model file format has been updated which has removed the need for a separate 'reference' file in order to save past training decisions. Compatibility with the old format has been retained, but using the new format adds many benefits for the user experience.
  • A thread starvation issue was fixed in DataCleaner monitor. The impact of this issue was great, but it happened only in rare and very customized cases. If custom listener objects on the DataCleaner monitor would throw an error, it would result in a resource never being freed up and taking up a thread from the Quartz-scheduling pool on the server. If this would happen many times the server could eventually run out of threads in that pool.
  • The vertical menu on the result screen is now doing a proper job of displaying the labels of the components that have results. This makes it easier to recognize which menu item points to what result item.
We hope you enjoy the new release. Please go to the Download page to try it out now.