Percentiles

Analyzing Business Data in SQL

Michel Semaan

Data Scientist

Percentiles - overview

  • Percentile: $n$th percentile is the value for which $n\%$ of a dataset's values are beneath this value
    • Lowest value is the 0th percentile
    • Highest value is the 99th percentile
Analyzing Business Data in SQL

Quartiles

  • Example: 25th percentile of user orders is 17, then 25% have ordered 17 times or less
  • First quartile: 25th percentile
  • Third quartile: 75th percentile
  • Interquartile range (IQR): All data between the first and third quartiles
  • Second quartile: 50th percentile, median
    • Different from the mean!
Analyzing Business Data in SQL

Skewed data

Skewed distributions

Analyzing Business Data in SQL

Quartiles - query

WITH user_orders AS (
  SELECT
    user_id,
    COUNT(DISTINCT order_id) AS orders
  FROM orders
  GROUP BY user_id)

SELECT ROUND( PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY orders ASC) :: NUMERIC, 2) AS orders_p25, ROUND( PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY orders ASC) :: NUMERIC, 2) AS orders_p50, ROUND( PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY orders ASC) :: NUMERIC, 2) AS orders_p75, ROUND(AVG(orders) :: NUMERIC, 2) AS avg_orders FROM user_orders;
Analyzing Business Data in SQL

Quartiles - result

orders_p25  orders_p50  orders_p75  avg_orders
----------  ----------  ----------  ----------
6.00        8.00        11.00       8.70
Analyzing Business Data in SQL

Percentiles

Analyzing Business Data in SQL

Preparing Video For Download...