Measures of spread

Introduction to Statistics

George Boorman

Curriculum Manager, DataCamp

What is spread?

vehicle_crimes_histogram_with_wide_spread.png

burlgary_crimes_histogram_with_narrow_spread.png

Introduction to Statistics

Why is spread important?

  • Spread measures the variety of our data

 

  • T-shirts typically cost $30

    • Can cost between $10-200
    • How likely is it one will cost $30?
  • If t-shirts were priced between $20-50

    • Does this change the likelihood of finding one for $30?

t_shirt_hanging_against_a_wall.jpg

1 Image credit: https://unsplash.com/@uyk
Introduction to Statistics

Range

 

${range} = maximum - minimum$

 

${range(Burglaries)} = 5,183 - 1,432$

${range(Burglaries)} = 3,751$

Borough Burglary
Tower Hamlets 5,183
Hackney 5,079
Barnet 5,067
... ...
Sutton 1,815
Bexley 1,583
Kingston upon Thames 1,432
Introduction to Statistics

Variance

A dot plot with a red line in the middle representing the mean.png

Introduction to Statistics

Variance

variance_plot_showing_distance_between_Westminster_and_the_mean.png

Introduction to Statistics

Variance

Borough Total Crime Mean Distance
Barking and Dagenham 37,939 47,672 -9,733
Barnet 52,421 47,672 4,749
Bexley 29,285 47,672 -18,387
Brent 55,465 47,672 7,793
Bromley 42,982 47,672 -4,690
Camden 54,806 47,672 7,134
... ... ... ...
Total 1,525,492 1,525,492 0
Introduction to Statistics

Variance

Borough Total Crime Mean Distance Squared Distance
Barking and Dagenham 37,939 47,672 -9,733 94,731,289
Barnet 52,421 47,672 4,749 22,553,001
Bexley 29,285 47,672 -18,387 338,081,769
Brent 55,465 47,672 7,793 60,730,849
Bromley 42,982 47,672 -4,690 21,996,100
Camden 54,806 47,672 7,134 50,893,956
... ... ... ... ...
Total 1,525,492 1,525,492 0 7,509,750,824
Introduction to Statistics

Variance

 

$${variance(total \ crime)} = \frac{7,509,750,824}{32}$$

$${variance(total \ crime)} = \ 234,679,713$$

Introduction to Statistics

Standard deviation

${standard \ deviation(total \ crime)} = {\sqrt( variance(total \ crime))}$

${standard \ deviation(total \ crime)} = {\sqrt(234,679,713)}$

${standard \ deviation(total \ crime)} = 15,319.26$

  • Standard deviation close to zero = data clustered around the mean
Introduction to Statistics

Standard deviation in a histogram

vehicle_crime_histogram_with_one_and_two_standard_deviations_from_the_mean.png

Introduction to Statistics

Quartiles

  • Quartiles:
    • splitting the data into four equal parts

 

Crime 0% 25% 50% 75% 100%
Burglary 1,432.00 2,681.75 3,416.50 4,392.00 5,183.00
Robbery 363.00 895.75 1,354.50 1,976.50 4,156.00
Theft 4,090.00 7,739.75 9,624.00 12,059.00 40,278.00
Vehicle Offenses 2,143.00 4,838.25 6,424.50 7,520.75 11,292.00
Introduction to Statistics

Quartiles

  • Quartiles:
    • splitting the data into four equal parts

 

Crime 0% 25% 50% 75% 100%
Burglary 1,432.00 2,681.75 3,416.50 4,392.00 5,183.00
Robbery 363.00 895.75 1,354.50 1,976.50 4,156.00
Theft 4,090.00 7,739.75 9,624.00 12,059.00 40,278.00
Vehicle Offenses 2,143.00 4,838.25 6,424.50 7,520.75 11,292.00

 

  • Second quartile (50%) = median
Introduction to Statistics

Box plots

boxplot_of_robberies_in_London_with_median_plus_first_and_third_quartiles_highlighted.png

Introduction to Statistics

Interquartile range (IQR)

boxplot_robberies_in_London_with_interquartile_range_highlighted.png

  • IQR is less affected by extreme values

          IQR = 3rd Quartile - 1st Quartile

                IQR = 1976.50 - 895.75

                       IQR = 1080.75

Introduction to Statistics

Let's practice!

Introduction to Statistics

Preparing Video For Download...