Menerapkan agregat statistik pada data deret waktu

Analisis Deret Waktu di PostgreSQL

Jasmin Ludolf

Content Developer, DataCamp

Agregat statistik

  • Fungsi agregat: MIN(), MAX(), SUM(), AVG(), COUNT()
  • Agregat statistik: mean dan median

Ilustrasi menampilkan empat orang di sekitar papan dengan visual analitik dan statistik.

Analisis Deret Waktu di PostgreSQL

Menghitung rata-rata

SELECT
  AVG(views)::INTEGER as avg_views
  title
FROM dc_news_fact
JOIN dc_news_dim USING(id)
GROUP BY title
ORDER BY avg_views DESC;
|avg_views|title                      |
|---------|---------------------------|
|      207|For the Wealthiest, a Pr...|
|       12|Pet surrenders on rise a...|
|       12|Argentina's New Presiden...|
|        7|These 5 charts prove tha...|
|        7|How Is the Economy Doing...|
Analisis Deret Waktu di PostgreSQL

Rata-rata jumlah tayangan per hari

WITH day_views AS (
  SELECT id, ts::DATE AS date, SUM(views) AS views
  FROM dc_news_fact
  GROUP BY id, date
)
SELECT
  AVG(views)::INTEGER AS avg
  title
FROM day_views JOIN dc_news_dim USING(id)
GROUP BY title
ORDER BY avg DESC;
Analisis Deret Waktu di PostgreSQL

Rata-rata jumlah tayangan per hari

| avg|title                                                            |
|----|-----------------------------------------------------------------|
|9855|For the Wealthiest, a Private Tax System That Saves Them Billions|
| 579|Pet surrenders on rise as Fort McMurray's economy falls          |
| 574|Argentina's New President Moves Swiftly to Shake Up the Economy  |
| 352|How Is the Economy Doing? Politics May Decide Your Answer        |
| 348|These 5 charts prove that the economy does better under ...      |
Analisis Deret Waktu di PostgreSQL

Rata-rata jumlah tayangan per hari

SELECT
  (SUM(views)/COUNT(DISTINCT ts::DATE))::INTEGER as avg,
  title
FROM dc_news_fact JOIN dc_news_dim USING(id)
GROUP BY title 
ORDER BY avg DESC;
|avg |title                                                            |
|----|-----------------------------------------------------------------|
|9855|For the Wealthiest, a Private Tax System That Saves Them Billions|
| 579|Pet surrenders on rise as Fort McMurray's economy falls          |
| 574|Argentina's New President Moves Swiftly to Shake Up the Economy  |
...
Analisis Deret Waktu di PostgreSQL

Median diskret dan kontinu

  • Median diskret: nilai pertama yang paling dekat ke nilai tengah

  • Median kontinu: nilai yang membagi dataset menjadi dua sama besar

Jumlah elemen ganjil

  • Deret = (1, 2, 3, 4, 5)
  • Median diskret = 3
  • Median kontinu = 3

Jumlah elemen genap

  • Deret = (1, 2, 3, 4)
  • Median diskret = 2
  • Median kontinu = 2,5
Analisis Deret Waktu di PostgreSQL

Fungsi agregat ordered-set

  • PERCENTILE_DISC()
  • PERCENTILE_CONT()
  • Fungsi agregat ordered-set: PERCENTILE_DISC(fraction) WITHIN GROUP (ORDER BY field)
SELECT
  PERCENTILE_CONT(0.5) WITHIN GROUP
  (ORDER BY value) AS median_cont,
  PERCENTILE_DISC(0.5) WITHIN GROUP
  (ORDER BY value) AS median_disc
FROM
(
  VALUES
  (1,1), (1,2), (1,3), (1,4), (1,5),
  (2,1), (2,5), (2,7), (2,11), (2,11)
) AS t (id, value)
GROUP BY id;
Analisis Deret Waktu di PostgreSQL

Fungsi agregat ordered-set

|median_cont|median_disc|
|-----------|-----------|
|        3.0|          3|
|        7.0|          7|
Analisis Deret Waktu di PostgreSQL

Median, kuantil, persentil, kuartil

  • Median adalah jenis persentil
  • Persentil adalah jenis kuantil

 

  • Kuantil: membagi sampel menjadi subset hampir sama besar
    • kuartil (empat subset)
    • desil (sepuluh subset)
Analisis Deret Waktu di PostgreSQL

Menghitung kuartil

SELECT
  PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY value) AS ptile_25,
  PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY value) AS ptile_50,
  PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY value) AS ptile_75
FROM
(
  VALUES 
  (1,1), (1,2), (1,3), (1,4), (1,5)
) AS t (id, value)
GROUP BY id;
|ptile_25|ptile_50|ptile_75|
|--------|--------|--------|
|       2|       3|       4|
Analisis Deret Waktu di PostgreSQL

Menghitung array kuartil diskret

SELECT
  PERCENTILE_DISC(ARRAY[0.25, 0.5, 0.75]) 
  WITHIN GROUP (ORDER BY value) AS median_disc
FROM (
  VALUES
  (1,1), (1,2), (1,3), (1,4), (1,5) )
  AS t (id, value)
GROUP BY id;
|median_disc|
|-----------|
|{2,3,4}    |
Analisis Deret Waktu di PostgreSQL

Ayo berlatih!

Analisis Deret Waktu di PostgreSQL

Preparing Video For Download...