Window Functions in Snowflake
Jake Roach
Field Data engineer
Come possiamo classificare i membri della palestra in base ai loro allenamenti per promuovere le giuste classi?
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_74 | Miami | 59 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
m_565 | Miami | 1085 | 50
...
SELECT <fields>, <2>, <1>,NTILE(<n>) OVER( PARTITION BY <2> ORDER BY <1> )...;
NTILE è usato per creare "N" bucket di dimensioni uguali
$$
<n>: numero di bucket
<1>: campo usato per creare i bucket
<2>: campo usato per distribuire uniformemente i record usando PARTITION BY
SELECT member_id, gym_location, calories_burned, -- Crea 50 bucket di dati di dimensioni ugualiNTILE(50) OVER( ORDER BY calories_burned -- Decide i record in ogni bucket ) AS marketing_groupFROM FITNESS.workouts ORDER BY marketing_group, calories_burned; -- ORDINA il set di risultati finale
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_74 | Miami | 59 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
m_565 | Miami | 1085 | 50
...
SELECT member_id, gym_location, calories_burned, NTILE(50) OVER(-- Distribuisci uniformemente i record in bucket per ogni gym_location PARTITION BY gym_locationORDER BY calories_burned ) AS marketing_group FROM FITNESS.workouts ORDER BY marketing_group, calories_burned;
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 1
m_74 | Miami | 59 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
...
$$
$$
member_id | cals_burned | cd
----------- | ------------- | ------
m_192 | 45 | .016
m_74 | 59 | .033
m_233 | 60 | .049
m_14 | 72 | .066
m_346 | 77 | .082
m_289 | 81 | .098
....
m_565 | 1085 | 1.000
SELECT member_id, gym_location, calories_burned, CUME_DIST() OVER( PARTITION BY gym_location, -- Crea una distribuzione per ogni location ORDER BY calories_burned ) AS cd FROM FITNESS.workoutsORDER BY gym_location, cd; -- ORDINA il set di risultati finale
SELECT <fields>, <1>, <2>,CUME_DIST() OVER( PARTITION BY <1> ORDER BY <2> )...;
Confronta ogni record con la distribuzione per quella colonna/campo, distribuzione cumulativa
$$
<1>: campo che determina la finestra da valutare
<2>: campo per creare la distribuzione
$$
member_id | gym_location | calories_burned | cd
----------- | -------------- | ---------------- | -------
m_192 | Miami | 45 | .033
m_74 | Miami | 59 | .066
m_288 | Miami | 83 | .098
m_541 | Miami | 85 | .131
...
m_233 | Portland | 60 | .071
m_346 | Portland | 77 | .142
...
Window Functions in Snowflake