Data Quality

Hi TDA User,

  We are working on a set of tools that will help you with data quality, whether that is data validation or data cleansing. This can be a large area and I am in need of your input. If you have a moment, please provide details (and more importantly examples) of your specific data validation/cleansing needs.  

   Here is a list of questions to give you an idea of the type of data I am looking for. You don’t have to answer every question. I am most interested in details of specific validation/scrubbing problems that take you a lot of time and if automated would lighten your work load. 
  1.   Do you write and execute data validation statements on your source data?

a. What types of data validation checks do you perform?

                                                           i.      Single column checks
  1.   Looking for null Values
  2.   Checking specific values exist
  3.   Checking specific  values do not exist
  4.   Checking values are within a range
  5.   Checking values exist in a reference table
  6.   Finding values not in an a reference table
  7.   Data type validation
  8.   Format/Mask validation (Address, emails, phone numbers, Addressing Names, SSN etc)
  9.   Checking for fuzzy or exact duplicates
                                                          ii.      Multiple column/row checks
  10.   Checking a set of columns add up to another column (total or concatenation with a mask).
  11.   Checking aggregation of a column grouped by a surrogate key reconciles to another set.
  12.   Checking values of a 2<sup>nd</sup> column are consistent for same surrogate key

b. What do you do with validation errors?

  1.       Do you write & execute data cleansing/scrubbing statements?

a. What type of data scrubbing do you do?

b. What happens to your dirty data?

   Please include any SQL or sample data.    You can reply to this post or email we directly.

   Hope to hear from you soon.  

Thanks, Debbie