Determining Data Quality Rules from the Data (Data Profiling)
Is there a way to determine the data quality rules by profiling the data? Conceptually, I want to be able to determine Completeness, Accuracy and Consistency rules by profiling the data. In most cases the business rules for the data quality are not documented well, hence, one needs to be able to reverse engineer them from the data set. Has anyone used pattern analyzer to determine these rules?
I'm not entirely sure what you mean you mean with "determine DQ rules", but most of the analyzers are for determining the current DQ metrics; completeness analyzer to find how complete the data is, value distribution to find outliers on something that should be a limited set of values, the pattern analyzer to check e.g. postal codes or phone numbers for consistency, unique key check to find duplicate key and the referential integrity analyzer to e.g. make sure that disparate databases contains the same data.
If you're using the professional edition, some of these analyzers can be automatically applied by using the "What can you tell me about my data", which will then generate a job that you can use as a starting point.