Bucketing

Analyzing Business Data in SQL

Michel Semaan

Data Scientist

Histograms - recap

WITH user_orders AS (
  SELECT
    user_id,
    COUNT(DISTINCT order_id) AS orders
  FROM meals
  JOIN orders ON meals.meal_id = orders.meal_id
  GROUP BY user_id)

SELECT
  orders,
  COUNT(DISTINCT user_id) AS users
FROM user_orders
GROUP BY orders
ORDER BY orders ASC
LIMIT 5;
Analyzing Business Data in SQL

CASE expression

Query

SELECT
  CASE
    WHEN meal_price < 4 THEN 'Low-price meal'
    WHEN meal_price < 6 THEN 'Mid-price meal'
    ELSE 'High-price meal'
  END AS price_category,
  COUNT(DISTINCT meal_id)
FROM meals
GROUP BY price_category;
Analyzing Business Data in SQL

Bucketing - query

WITH user_revenues AS (
  SELECT
    user_id,
    SUM(meal_price * order_quantity) AS revenue
  FROM meals
  JOIN orders ON meals.meal_id = orders.meal_id
  GROUP BY user_id)

SELECT CASE WHEN revenue < 150 THEN 'Low-revenue users' WHEN revenue < 300 THEN 'Mid-revenue users' ELSE 'High-revenue users' END AS revenue_group, COUNT(DISTINCT user_id) AS users FROM user_revenues GROUP BY revenue_group;
Analyzing Business Data in SQL

Bucketing - result

revenue_group       users
------------------  -----
Low-revenue users   473
Mid-revenue users   606
High-revenue users  225
Analyzing Business Data in SQL

Histograms versus bar graphs

Histogram of orders per user

Users per revenue group bar chart

Analyzing Business Data in SQL

Bucketing

Analyzing Business Data in SQL

Preparing Video For Download...