One of the most common goals of data quality and data governance
programmes is to get rid of duplicate records. With DataCleaner you
can now discover your duplicates and effectively get rid of them
(see which editions).
How does it work?
There are many distinguishing features in DataCleaner's
duplicate detection feature that makes it a strong choice for
High Quality: Quality is the hallmark of matching, our
duplicate detection feature delivers on this promise.
Easy to use: We will not ask you to dive into endless
configuration parameters - rather you simply need to spend a few
minutes to instruct it on a sample/training set - and then you can
apply the logic to the full set.
Agile: When using DataCleaner in the in-memory mode
you very quickly get results. An iterative approach to duplicate
detection makes you more agile and gets configuration issues out
of the way quicker.
Seamlessly integrated with DataCleaner - allows you to
leverage existing data profiling and transformation toolkit. The
more you can standardize and separate meaningful parts of your
data using transformations, the more accurate we can help
determine the duplicates.
For the Enterprise Edition of DataCleaner we provide a
dedicated server, built on top of Big Data framework Hadoop.
There are practicaly no limits to the amounts of data that is
Machine learning based: The Duplicate detection engine is
configured by examples. During a series of training sessions you
can refine the deduplication model simply by having a conversation
with the tool about what is and what isn't a good example of a
International: International data is supported and no
regional knowledge has been encoded into the deduplication engine
- you provide the business rules externally.