Sampling and Summarizing

Data Preparation in Alteryx

Deanna Sanchez

Alteryx ACE and Owner, Nova Geographica LLC

Sampling data is like . . .

Taking a sample from a chemistry lab test tube.

  • Top layer of fluid
  • Top percent of fluid
  • Random sample (1 in N chance)

Image of chemistry test tube with fluid

Data Preparation in Alteryx

Sample tool options

Options for sampling data:

  • First or Last N rows
  • Skip 1st N rows
  • 1 of every N rows
  • 1 in N chance to include each row
  • First N% of rows
Data Preparation in Alteryx

Top N and Bottom N Records

  • Use a Sort tool before the Sample tool
    • Sort in descending order
  • Add a Sample tool
    • First N records where N = 10
  • Example: Top 10 Sales

Sample Top 10 image

Data Preparation in Alteryx

Skip 1st N rows

Provides all rows after the first number of records specified is skipped

  • Ensures only rows containing data remain
  • Example: Skip 1st 4 rows is used to skip header rows where data begins on line 5

Spreadsheet with header and logo example

Data Preparation in Alteryx

1 of every N rows

Returns the first row of every N rows in the dataset

  • Example: 1 of every 10 records
    • Useful for an overview of a running total count
    • Dataset with 100 records would return 10 records
  • This is not a random sample process

Running Total 1 of every 10 records example

Data Preparation in Alteryx

1 in N chance to include each row

Random sample of the dataset

  • Example: 1 in 25 chance out of 1000 records
    • Random sample of the entire dataset
  • Produces a new randomly sampled dataset with each run of the workflow
    • Different IDs will be output with each run

1 in N random chance example.jpg

Data Preparation in Alteryx

First N% of rows

Returns the first N percentage of rows

  • Example: First 25% of rows, such as Top 25% of sales
  • For 1,000 records, the first 25% returns rows 1 - 250

First N percent of rows example

Data Preparation in Alteryx

Grouping with samples

  • Allows the data to be sampled per group(s)
  • One or more fields can be selected
  • Example: Top 10 sales by region

Top 10 sales by region example

Data Preparation in Alteryx

Using the Summarize tool

The Summarize tool allows you to group and summarize data using many options.

  • Actions depend on the field data type
  • Only fields with actions are output
  • Same field can have multiple actions
    • Example: Min and Max Sales Revenue

Region min and max sales example

Data Preparation in Alteryx

Summarizing string data

Some of the string data actions available are:

  • Group By
  • Count
  • Count Non Null
  • Min/Max
  • Mode
  • First/Last
  • Concatenate

Group by region and count example

Data Preparation in Alteryx

Summarizing numeric data

Some of the numeric data actions available are:

  • Group By and earlier string data options, plus:
  • Sum
  • Average, Median, Mode
  • Percentile
  • Standard Deviation
  • Finance formulas such as IRR (Internal Rate of Return) and NPV (Net Present Value)

Summarize numeric data example

Data Preparation in Alteryx

Order does matter

Actions are applied hierarchically

  • Especially applies to the Group By function
  • Example:
    • Group by Region
    • Group by Sales team
    • Group by Salesperson

Group by region and sales team example

Data Preparation in Alteryx

Alteryx file types

  • Alteryx database .yxdb
    • Native file format optimized to run in Alteryx
    • Can output data as .yxdb and input .yxdb into workflows
  • Alteryx workflow .yxmd
    • Alteryx Designer workflow format
  • Alteryx packaged workflow .yxzp
    • Alteryx zipped workflow
    • Can include input and output datasets

File types.jpg

Data Preparation in Alteryx

Let's practice!

Data Preparation in Alteryx

Preparing Video For Download...