Summary statistics

Data Analysis in Google Sheets

James Chapman

Curriculum Manager, DataCamp

Exploring data

 

  • Characterize the data
  • Identify data quality issues

 

The exploring data step highlighted alongside the other stages of the data analysis process.

Data Analysis in Google Sheets

Summary statistics

 

  1. Measures of frequency:

    • How often does a value occur?
  2. Measures of center:

    • What does a typical value look like?
  3. Measures of spread:

    • How do values vary across the dataset?
Data Analysis in Google Sheets

Measures of frequency

 

  • Count:
    • COUNT()
    • COUNTA()
    • COUNTBLANK()

 

A column containing a variety of different data types.

Data Analysis in Google Sheets

COUNT()

 

  • Counts cells containing numerical data
    • Dates
    • Currencies

 

A column containing a variety of different data types.

Data Analysis in Google Sheets

COUNT()

 

  • Counts cells containing numerical data
    • Dates
    • Currencies

 

= COUNT(A1:A5)
3

 

A column containing a variety of different data types - the cells containing numerical data have been highlighted in yellow.

Data Analysis in Google Sheets

COUNTA()

 

  • Counts cells containing any data type
    • Empty strings ("")
    • Errors (#DIV/0!)

 

A column containing a variety of different data types.

Data Analysis in Google Sheets

COUNTA()

 

  • Counts cells containing any data type
    • Empty strings ("")
    • Errors (#DIV/0!)

 

= COUNTA(A1:A5)
4

 

A column containing a variety of different data types - the non-blank cells have been highlighted in pink.

Data Analysis in Google Sheets

COUNTBLANK()

  • Counts blank cells
    • Empty cells
    • Empty strings ("")

 

= COUNTBLANK(A1:A5)
1

 

A column containing a variety of different data types - the blank cells have been highlighted in blue.

Data Analysis in Google Sheets

Measures of center

  • Aim to describe a "typical" value

 

  • Mean
    • Often referred to as "average"
    • Sum of values / Count of values
  • Median
    • The middle number in a sorted list of values
    • Used when there are outliers

A column containing age data.

Data Analysis in Google Sheets

Mean

  • Sum of values / Count of values

 

= SUM(A1:A5)
219
= SUM(A1:A5) / COUNT(A1:A5)
43.8

A column containing age data.

Data Analysis in Google Sheets

AVERAGE()

  • Sum of values / Count of values

 

= AVERAGE(A1:A5)
43.8

A column containing age data.

Data Analysis in Google Sheets

Median

  • The middle number in a sorted list of values

A column containing age data.

Data Analysis in Google Sheets

Median

  • The middle number in a sorted list of values

A sorted column containing age data.

Data Analysis in Google Sheets

MEDIAN()

  • The middle number in a sorted list of values

 

= MEDIAN(A1:A5)
47

A sorted column containing age data, with the middle value highlighted.

Data Analysis in Google Sheets

Identifying data quality issues

 

Missing data:

 

  • COUNTBLANK()
  • COUNT()
  • COUNTA()

 

Erroneous data:

 

  • MAX() → Maximum value in a range
  • MIN() → Minimum value in a range

 

  • Example: Ages of -1 or 250 indicate errors
Data Analysis in Google Sheets

Let's practice!

Data Analysis in Google Sheets

Preparing Video For Download...