Fungsi Window di Snowflake
Jake Roach
Field Data engineer
Bagaimana kita dapat mengklasifikasikan anggota gym berdasarkan latihan mereka untuk memasarkan kelas yang tepat?
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_74 | Miami | 59 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
m_565 | Miami | 1085 | 50
...
SELECT <fields>, <2>, <1>,NTILE(<n>) OVER( PARTITION BY <2> ORDER BY <1> )...;
NTILE digunakan untuk membuat "N" jumlah "bucket" yang berukuran sama
$$
<n>: jumlah bucket
<1>: field yang digunakan untuk membuat bucket
<2>: field yang digunakan untuk mendistribusikan catatan secara merata menggunakan PARTITION BY
SELECT member_id, gym_location, calories_burned, -- Buat 50 bucket data yang berukuran samaNTILE(50) OVER( ORDER BY calories_burned -- Menentukan catatan dalam setiap bucket ) AS marketing_groupFROM FITNESS.workouts ORDER BY marketing_group, calories_burned; -- URUTKAN set hasil akhir
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_74 | Miami | 59 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
m_565 | Miami | 1085 | 50
...
SELECT member_id, gym_location, calories_burned, NTILE(50) OVER(-- Sebarkan catatan secara merata dalam bucket berdasarkan setiap gym_location PARTITION BY gym_locationORDER BY calories_burned ) AS marketing_group FROM FITNESS.workouts ORDER BY marketing_group, calories_burned;
member_id | gym_location | calories_burned | marketing_group
----------- | -------------- | ----------------- | -----------------
m_192 | Miami | 45 | 1
m_233 | Portland | 60 | 1
m_14 | Cleveland | 72 | 1
m_74 | Miami | 59 | 2
m_346 | Portland | 77 | 2
m_289 | Cleveland | 81 | 2
...
$$
$$
member_id | cals_burned | cd
----------- | ------------- | ------
m_192 | 45 | .016
m_74 | 59 | .033
m_233 | 60 | .049
m_14 | 72 | .066
m_346 | 77 | .082
m_289 | 81 | .098
....
m_565 | 1085 | 1.000
SELECT member_id, gym_location, calories_burned, CUME_DIST() OVER( PARTITION BY gym_location, -- Buat distribusi untuk setiap lokasi ORDER BY calories_burned ) AS cd FROM FITNESS.workoutsORDER BY gym_location, cd; -- URUTKAN set hasil akhir
SELECT <fields>, <1>, <2>,CUME_DIST() OVER( PARTITION BY <1> ORDER BY <2> )...;
Membandingkan setiap catatan dengan distribusi untuk kolom/field tersebut, distribusi kumulatif
$$
<1>: field yang menentukan jendela untuk evaluasi
<2>: field untuk membuat distribusi
$$
member_id | gym_location | calories_burned | cd
----------- | -------------- | ---------------- | -------
m_192 | Miami | 45 | .033
m_74 | Miami | 59 | .066
m_288 | Miami | 83 | .098
m_541 | Miami | 85 | .131
...
m_233 | Portland | 60 | .071
m_346 | Portland | 77 | .142
...
Fungsi Window di Snowflake