What is data cleaning and preparation?

Data Preparation in Alteryx

Deanna Sanchez

Alteryx ACE and Owner, Nova Geographica LLC

Why is clean data important?

GIGO: "Garbage In, Garbage Out"

  • Cleaning and preparing data ensures you:
    • Avoid mistakes and prevent errors
    • Standardize data and formats
    • Increase productivity
    • Improve speed to insight

A depiction of garbage going into and garbage coming out of a database

Data Preparation in Alteryx

Cleaning your data is like...

Tuning up your car - Optimizing your car with clean components and fresh fluids boosts performance.

 

An image of a mechanic tuning up a car

Data Preparation in Alteryx

Examples of Dirty Data

  • Missing or incomplete data

Data Table highlighting missing data such as blank data cells

Data Preparation in Alteryx

Examples of Dirty Data

  • Missing or incomplete data
  • Unstandardized or inconsistent data

Data table highlighting inconsistent data such as upper case data mixed with title case data

Data Preparation in Alteryx

Examples of Dirty Data

  • Missing or incomplete data
  • Unstandardized or inconsistent data
  • Data entry errors

Data table highlighting data entry errors such as too many digits

Data Preparation in Alteryx

Examples of Dirty Data

  • Missing or incomplete data
  • Unstandardized or inconsistent data
  • Data entry errors
  • Leading or trailing whitespace, and unneeded characters or punctuation

Data table highlighting unneeded punctuation such as dollar signs on numeric data

Data Preparation in Alteryx

Clean data techniques

Manage Missing Data

  • Flag missing data by imputing values, such as:
    • Converting null values to blanks for string data types
    • Imputing null values to 0 for numeric types
  • Filter missing data from the data stream

An image of analysts cleaning data files with a magnifying glass and broom

Data Preparation in Alteryx

Clean data techniques

Standardize Data

  • Ensure formatting
    • Add $ to currency
    • 000123456
  • Check naming conventions for field names and filenames
    • "filename_01012004.csv"
  • Modify case of data in a field
    • LOCATION
  • Modify data types

An image of an analyst filing data into proper file folders

Data Preparation in Alteryx

Clean data techniques

Remove unnecessary items

  • Leading/trailing whitespace
  • Tabs and line breaks
  • Rows or columns not needed
  • Unneeded punctuation, letters, and numbers

An image of construction workers cleaning data files

Data Preparation in Alteryx

Profile with color-coding

Color coordination in the Results window and Profile guide the use of data cleaning techniques.

  • Green = OK

OK color profile

Data Preparation in Alteryx

Profile with color-coding

Color coordination in the Results window and Profile guide the use of data cleaning techniques.

  • Green = OK
  • White = Unique

OK and Unique color profiles

Data Preparation in Alteryx

Profile with color-coding

Color coordination in the Results window and Profile guide the use of data cleaning techniques.

  • Green = OK
  • White = Unique
  • Yellow = Null

OK, Unique, and Null color profiles

Data Preparation in Alteryx

Profile with color-coding

Color coordination in the Results window and Profile guide the use of data cleaning techniques.

  • Green = OK
  • White = Unique
  • Yellow = Null
  • Red = Not OK (e.g., trailing whitespace)

OK, Unique, Null, and Not OK color profiles

Data Preparation in Alteryx

Profile with color-coding

Color coordination in the Results window and Profile guide the use of data cleaning techniques.

  • Green = OK
  • White = Unique
  • Yellow = Null
  • Red = Not OK (e.g., trailing whitespace)
  • Grey = Empty

All color profiles

Data Preparation in Alteryx

Data types in Alteryx

Table with Boolean data type displayed

  • Important to know various data types
    • How and when to use
  • Five main data categories
    • Boolean - binary formats
Data Preparation in Alteryx

Data types in Alteryx

Table with Boolean and Numeric displayed

  • Important to know various data types
    • How and when to use
  • Five main data categories
    • Boolean - binary formats
    • Numeric - number data, includes Double
Data Preparation in Alteryx

Data types in Alteryx

Table with Boolean, Numeric, and String displayed

  • Important to know various data types
    • How and when to use
  • Five main data categories
    • Boolean - binary formats
    • Numeric - number data, includes Double
    • String - text data
1 String, which ranges from String to V_W String types.
Data Preparation in Alteryx

Data types in Alteryx

Table with the four data types - Boolean, Numeric, String, and DateTime

  • Important to know various data types
    • How and when to use
  • Five main data categories
    • Boolean - binary formats
    • Numeric - number data, includes Double
    • String - text data
    • DateTime - dates and time data
Data Preparation in Alteryx

Data types in Alteryx

Table with all five main data types

  • Important to know various data types
    • How and when to use
  • Five main data categories
    • Boolean - binary formats
    • Numeric - number data, includes Double
    • String - text data
    • DateTime - dates and time data
    • Spatial - spatial objects and points
Data Preparation in Alteryx

Dataset details

  • Alteryx hands-on exercises:
    • New York City Property Sales
    • One dataset for all exercises
    • Analyzing the highest sales

An image of the New York City skyline

Data Preparation in Alteryx

Let's practice!

Data Preparation in Alteryx

Preparing Video For Download...