jvdongen started the topic:
2011-05-06 08:52

DataCleaner 2.1 3D charts?

Hey Kasper,

Just downloaded 2.1: it's a major step forward, my compliments! One thing though that made me shiver: the 3d charts. 3d in general is bad for charting, but if you insist on keeping it, please add a switch to turn it off. Then there's the pie charts... Please read about pie charts. I avoid using them altogether and it would be nice to have the option of selecting a simple 2d bar chart for data distributions as well. Of course this is nitpicking: an already great profiling tool just got (a lot!) better ;-)
deleted user replied:
2011-05-09 06:15
Hi Jos,

Glad that you like the latest version! And I'm glad that you share your ideas and knowledge around charts. And to be honest, charts have never been my strong side (but I already feel more enlightened by that link of yours)...

We could actually use someone with a better core feeling of how charts would fit in nicely with the application, so in case you have specific ideas for the result data of DataCleaner, then your contributions (eg. ideas and maybe some sketches/examples?) would be greatly appreciated!
jvdongen replied:
2011-05-09 07:07
Hi Kasper,

I'd be honored! It looks like you're using jFreeChart, is that correct? I'll send you some mock ups in the coming weeks
deleted user replied:
2011-05-09 09:03
Hi Jos,

You are correct, so if you "speak jfreechart" that's perfect. However we are not tied to that library so in case you have something else up your sleeve feel free to challenge us ;-)
ctian replied:
2011-05-17 18:55

just another idea came across my mind ref.the 3D-charts: what about changing the charts with the circle in a way, that it has more valuable summary information. Ever thought about a graphical summary, showing the absolute figure and/or percent as legend only for the following key value counters:


This is a very practicable approach and the way people look at data first. Then comes the drill down.

Best regards,
deleted user replied:
2011-05-17 19:26
Hi Christian,

This is an interesting approach and it does in deed solve a lot of problems with the current chart! What do you think Jos? Will it be too simple?

A technical question, just to be sure... What is the difference between DUPLICATES and NON-UNIQUE in this sense? Is DUPLICATES simply values where count=2? In a way it's weird to have such special treatment for count=2, but on the other hand I guess it is relevant for a lot of situations where you would expect unique values.
ctian replied:
2011-05-18 06:32
Hi Kasper,

you are right. DUPLICATE is a kind of synonym to counter = 2.

To know already that there are records with non-unique values is one thing, but for a non-experienced and non-technical user a special grouping of type DUPLICATE would be more alerting.

Maybe there could be also a logic saying "if there are non-unique values but only with counter = 2 per value lets name it DUPLICATE and if there is any counter > 3 then lets name and group all of them as NON_UNIQUE".

However, the main point is, to show these special value groups. And for detailled information (if we have a full detail list on the right hand side) it may be useful eventually to add a drill down function which is controlled by the graphics only. So if I click on NON-UNIQUE I get all record details for non-unique only. And so on.

What do you say ?
chiaochi replied:
2011-05-18 18:29
Glad to see another fan of Stephen Few in the house! I agree with Jos and Christian. I was just using DC to checking if a field with many unique values have NULLs and it wasn't pretty...
What do you think about this design:
We could add one of those green arrow next to unique to allow drill-down.
ctian replied:
2011-05-18 22:04
Good approach. Just one remark: I'm quite an optical person. So a circle representation (2D or 3D) of the shares of unique/duplicate/non-unique/null gives a better visual impression at one glance as just bars would do.
jvdongen replied:
2011-05-18 22:13
@ ctian: please have a look at We're trying to get rid of the pies (which should be saved for dessert, really!)

@ chiaochi: Glad we agree, and I love your design, really good and clear, though the duplicate bar inside the non-unique one will require some advanced jFreeChart magic. Not undoable, but neither very straightforward...
ctian replied:
2011-05-18 22:39
Hi Jos,

yes and no - I agree fully with you having many values being presented. But in the current scenario it would be just 4 max.

However, if you want to go for the bars, it's fine by me. For me the details (and the ability to export these) are more important, for example want to know then which records are causing a NULL counter in very detail.

Best regards,
chiaochi replied:
2011-05-19 15:58
@Jos: Sorry I don't have any experience with jFreeChart. How about using stacked bar chart similar to this:

@Christian: I see. My only concern is putting duplicate and non-unique side-by-side in a graph, because duplicate is really a subset. Any idea how we can represent that in a pie chart?
ctian replied:
2011-05-23 14:26
Hallo again,

just some addon to the proposed list before - what about such a counter summary:

- row count
- null count
- distinct count
- unique count
- duplicate count (ok, depends on the definition of "what is a duplicate", maybe a "non-unique" is better anyway)
- blank (=NULL) count

So this should cover all important variations of data occurencies for a good first overview. Its a really good start for analysis.
Of course it would be nice to drill then down into more detail (but this is another story).

What do you think ?
deleted user replied:
2011-07-06 10:41
Hi guys,

I've been doing quite a lot of improvements to DataCleaner's charts the last week, including the Value Distribution chart.

Check out this picasa web album with a few examples:
Value distribution chart proposals

Here's a summary of the changes so far:
  • Removed 3D effect
  • Removed "glossy" surface
  • Improved choice of colors (more contrast)
  • Automatically "exploding" groups if there are not too many.
  • <null> and <unique> groups have special colors: dark and light gray.
  • Added two metrics to chart subtitle: Total count and distinct count.
I know this does NOT include the proposed bar chart. But before doing that I would like to hear your reactions ... Actually I think this looks quite nice and does have some perks when comparing with the "fixed 4 bars" approach!
deleted user replied:
2011-07-07 05:46
Yes that is also a good point. I will make sure there is a button to "explode all groups" in the table, so that you can simply copy/paste the full table if you want to build eg. a report in excel.

I guess also an "export to excel" feature will improve in this area, and that is coming also at some point.
ctian replied:
2011-07-07 09:32
Hi Casper,

I like it. Looks quite good.

Reg.the statement "... exploding groups if there are not too many ...": here my main interest is in the tabular representation.

When running analysis sessions it is very seldom, that I have the need to click for more details.
Important is to see the column values + counter in one big list.

But I am afraid this request has to be moved to another section.

Best regards,