back to forum.

Topic: DQ monitoring server?

Topic by
kasper

2011-03-11
08:54

DQ monitoring server?

Dear DC users and developers,

At Human Inference we've been thinking of ways to build upon the foundation of datacleaner for making it a more complete data quality solution. An idea have been shaping that we want to ask you about your opinions about and maybe use those opinions to make some sort of decision on whether or not we're going to build it!

The idea is to have a server-side counterpart for the DataCleaner application. The purpose of the server-app would be to be able to schedule jobs, gather and persist results and show trends over time. I would call this functionality "DQ monitoring". The current DC app would be extended with a way to upload jobs to the server so that you can still work with your jobs in the regular DataCleaner application, but for enterprise deployment you would probably run them in batches on the server.

In terms of reporting we have in mind that you should of course be able to see the results for a single run, but you should ALSO be able to see the evolution of your profiling metrics. For example you might be interested in seeing trends in the patterns found or in the metrics available in the various analyzers.

Another possible feature would be to have email bursting built-in, so that in case you have a threshold value for some particular metric, you could recieve email alerts if your metrics no longer lives up to your goals.

What is your oppinion on such a DQ monitoring application? Do you think it would fit in nicely with DataCleaner? Or would it not add a lot of value?

Reply by
tech4

2011-04-12
18:07
kasper,

not sure if you guys have made a decision on this. but i would definitely love to have something like this in my tool list.

-tach4

Reply by
dba_alex

2011-04-18
19:00
I also think this would be an excellent addition to DataCleaner.

Reply by
ctian

2011-05-06
11:48
Sounds reasonable, but I hope that with this server-enhancements still the client will be available.

The evolution and/or trend reporting should be also available in the client version. This would be very useful.

Christian

Reply by
amus83

2011-08-07
05:34
It's necesary the automatization of monitoring of the data quality, because in some cases the information can change in any moment and depending the customer needs to check this quality in the data.


Reply by
kasper

2011-10-14
03:29
On a related note, there's a discussion on the DataCleaner-dev mailing list about this ... Take a look here, and feel free to join the conversation:

http://groups.google.com/group/datacleaner-dev/browse_thread/thread/fcb16c4f86f482d2

Reply by
kasper

2012-05-15
09:53
Happy to say that this work is now going on :) You can find it in the 3.0 branch of DataCleaner's source:

http://eobjects.org/svn/DataCleaner/branches/3.0-monitor/

Current situation is that we support a timeline view, manual building of repository (but there is an example) and drill-to-details from the timeline view, which results in a single (historic) profiling result. Pretty neat! Will try and blog about it soon.

Reply by
tzimmerman

2012-07-10
20:32
I think this is an awesome idea. I am very interested in learning more about this.
I have checked out the 3.0-monitor branch but am unable to build. Seems to be unable to find some classes which had been in AnalyzerBeans-core previously (e.g. org.eobjects.analyzer.result.PatternFinderResult) Am I being too impatient?

Reply by
kasper

2012-07-10
20:38
Great! To get the build working you also need to check out and build AnalyzerBeans. Located at http://eobjects.org/svn/AnalyzerBeans/trunk

This is because we are currently developing on both AB and DC so the dependency is snapshot based.

Would love to hear more from you. Share your impressions and thoughts.

Reply by
kasper

2012-07-10
21:43
Oh and by the way - we've merged the branch into trunk already, so don't check out from the branch, but get trunk directly. From here:

http://eobjects.org/svn/DataCleaner/trunk

Notice both some changes in the desktop app and the whole new "monitor" webapp.

Reply by
tzimmerman

2012-07-11
00:18
Kasper thank you for such quick responses!
I got the DataCleaner trunk and after a few hiccups with Maven not resolving some of the dependencies, I have the war file deployed.
I have just logged in and will poke around more tonight/tomorrow.

Thanks again!

Reply by
kasper

2012-08-17
14:35
Thanks everyone who has so far participated in this thread.

A heads-up: An alpha version of DC 3 (with the major new monitoring feature) have just been uploaded to sourceforge. Please also see installation instructions there.

https://sourceforge.net/projects/datacleaner/files/datacleaner%20%28unstable%29/3.0-alpha/

Any feedback, ideas, comments are greatly appreciated!

You need to be logged in to participate

In order to post your own comments on this topic, you need to be logged in.

Username:

Log in by clicking the login link at the top of the screen