Vensterfuncties in Snowflake
Jake Roach
Field Data engineer
Hoe kunnen we sportschoolleden indelen op basis van hun workouts om de juiste lessen te marketen?
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_74 | Miami | 59 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
m_565 | Miami | 1085 | 50
...
SELECT <fields>, <2>, <1>,NTILE(<n>) OVER( PARTITION BY <2> ORDER BY <1> )...;
NTILE wordt gebruikt om N even grote emmers te maken
$$
<n>: aantal emmers
<1>: veld om emmers te maken
<2>: veld om rijen gelijkmatig te verdelen met PARTITION BY
SELECT member_id, gym_location, calories_burned, -- Maak 50 even grote emmersNTILE(50) OVER( ORDER BY calories_burned -- Bepaalt de rijen in elke emmer ) AS marketing_groupFROM FITNESS.workouts ORDER BY marketing_group, calories_burned; -- Sorteer de uiteindelijke resultset
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_74 | Miami | 59 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
m_565 | Miami | 1085 | 50
...
SELECT member_id, gym_location, calories_burned, NTILE(50) OVER(-- Verdeel rijen gelijkmatig in emmers per gym_location PARTITION BY gym_locationORDER BY calories_burned ) AS marketing_group FROM FITNESS.workouts ORDER BY marketing_group, calories_burned;
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 1
m_74 | Miami | 59 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
...
$$
$$
member_id | cals_burned | cd
----------- | ------------- | ------
m_192 | 45 | .016
m_74 | 59 | .033
m_233 | 60 | .049
m_14 | 72 | .066
m_346 | 77 | .082
m_289 | 81 | .098
....
m_565 | 1085 | 1.000
SELECT member_id, gym_location, calories_burned, CUME_DIST() OVER( PARTITION BY gym_location, -- Maak een verdeling per locatie ORDER BY calories_burned ) AS cd FROM FITNESS.workoutsORDER BY gym_location, cd; -- Sorteer de uiteindelijke resultset
SELECT <fields>, <1>, <2>,CUME_DIST() OVER( PARTITION BY <1> ORDER BY <2> )...;
Vergelijkt elk record met de verdeling voor die kolom/veld, cumulatieve verdeling
$$
<1>: veld dat het venster bepaalt
<2>: veld waarvoor de verdeling wordt gemaakt
$$
member_id | gym_location | calories_burned | cd
----------- | -------------- | ---------------- | -------
m_192 | Miami | 45 | .033
m_74 | Miami | 59 | .066
m_288 | Miami | 83 | .098
m_541 | Miami | 85 | .131
...
m_233 | Portland | 60 | .071
m_346 | Portland | 77 | .142
...
Vensterfuncties in Snowflake