back to forum.

Topic: Is there a general need to generate deltas between two files?

Topic by
wvholland

2011-10-21
03:30

Is there a general need to generate deltas between two files?

Normally we are asked as Data Quality company to fetch the duplicates out of one or multiple files. In some situations customers are asking the opposite. Can you generate the difference between todays file compared to, for example, yesterdays file. For simplicity I call that generating the deltas between two files.
Do others also see the need for such a analyses module, and how would we do that in Datacleaner

thanks in advance

Reply by
emil

2011-10-21
03:30
I felt the need for this sort of functionality when I was cleansing a data set and wanted to know whether the results would be better if I would do some things a bit differently.

I had a short discussion with Kasper about it, but we didn't talk about how to implement it. But it can be done if we think it's important enough.

Reply by
kasper

2011-10-25
03:21
Hi all,

This would indeed be an interesting feature. Actually the "old" DataCleaer (version 1.x) had the dedicated compare function for this, but I don't think it was very good... We should definately make a new feature for this that does some nifty tricks to keep track of updated, insertions and deletions when comparing with previous versions of the same file.

Reply by
ctian

2011-10-27
03:23
Hi all,

from my experience there is a real need for such a function.
Especially when we have to test and verify new versions then a comparison for example of a new DWH table load with the table load before/of the old version is very important.

On the other hand I would like to outline the complexity of this. Because in our case we talk about real big datasets. If we do data comparisons then, then especially withing DBMS the sort order of the old and new dataset often do not match. So a presort has to be performed before any comparison can be started.

You need to be logged in to participate

In order to post your own comments on this topic, you need to be logged in.

Username:

Log in by clicking the login link at the top of the screen