Strategi untuk Banyak Transformasi

Analisis Data Eksploratif di SQL

Christina Maimone

Data Scientist

Banyak transformasi

SELECT * FROM naics;
   id   |                  category                   | businesses 
--------+---------------------------------------------+------------
 111110 | Agriculture: Soybean Farming                |       4788
 111130 | Agriculture | Dry Pea and Bean Farming      |       3606
 111140 | Agriculture: Wheat Farming                  |       6393
 111150 | Agriculture - Corn Farming                  |      26469
 111160 | Agriculture: Rice Farming                   |        949
 111199 | Agriculture - All Other Grain Farming       |      15035
 111211 | Agriculture | Potato Farming                |        617
 611110 | Education - Elementary and Secondary        |     187859
 611210 | Education | Junior Colleges                 |       3961
 611310 | Education: Colleges and Universities        |      29148
Analisis Data Eksploratif di SQL

CASE WHEN

-- Kasus untuk masing-masing :, -, dan |

SELECT CASE WHEN category LIKE '%: %' THEN split_part(category, ': ', 1)
WHEN category LIKE '% - %' THEN split_part(category, ' - ', 1)
ELSE split_part(category, ' | ', 1)
END AS major_category, -- beri alias hasilnya
sum(businesses) -- juga pilih jumlah bisnis FROM naics GROUP BY major_category; -- Kelompokkan berdasarkan kategori di atas
 major_category |  sum   
----------------+--------
 Education      | 220968
 Agriculture    |  57857
Analisis Data Eksploratif di SQL

Tabel recoding

Nilai asli: tabel fruit

 customer | fav_fruit 
----------+-----------
      349 | apple
      874 | Apple
      703 | apple
      667 | bannana
      622 | banana
      387 | BANANA
      300 | APPLES
      313 |  apple
      499 |  banana
      418 | apple
      841 | BANANA
      300 | APPLE
      754 | apple

Nilai baku: tabel recode

 original | standardized  
----------+--------------
 APPLES   | apple
 apple    | apple  
 Apple    | apple
 bannana  | banana
 apple    | apple
  banana  | banana
 banana   | banana
 APPLE    | apple
  apple   | apple
 BANANA   | banana
Analisis Data Eksploratif di SQL

Langkah 1: CREATE TEMP TABLE

CREATE TEMP TABLE recode AS

  SELECT DISTINCT fav_fruit AS original,  -- nilai asli, berantakan

         fav_fruit AS standardized        -- nilai baku baru

    FROM fruit;
Analisis Data Eksploratif di SQL

Tabel awal

SELECT * 
  FROM recode;
 original | standardized 
----------+--------------
 APPLES   | APPLES
 apple    | apple  
 Apple    | Apple
 bannana  | bannana
 apple    | apple
  banana  |  banana
 banana   | banana
 APPLE    | APPLE
  apple   |  apple
 BANANA   | BANANA
(10 rows)
Analisis Data Eksploratif di SQL

Langkah 2: UPDATE nilai

UPDATE table_name
   SET column_name = new_value
 WHERE condition;
Analisis Data Eksploratif di SQL

Langkah 2: UPDATE nilai

-- Semua baris: huruf kecil, hilangkan spasi di ujung

UPDATE recode 
   SET standardized=trim(lower(original));
-- Baris tertentu: perbaiki salah eja

UPDATE recode
   SET standardized='banana'
 WHERE standardized LIKE '%nn%';
-- Semua baris: hapus huruf s di akhir

UPDATE recode
   SET standardized=rtrim(standardized, 's');
Analisis Data Eksploratif di SQL

Tabel recode hasilnya

SELECT * 
  FROM recode;
 original | standardized 
----------+--------------
 APPLES   | apple
 apple    | apple
 Apple    | apple
 apple    | apple
  banana  | banana
 banana   | banana
 APPLE    | apple
  apple   | apple
 BANANA   | banana
 bannana  | banana
(10 rows)
Analisis Data Eksploratif di SQL

Langkah 3: JOIN tabel asli dan recode

Hanya asli

SELECT fav_fruit, count(*) 
  FROM fruit 
 GROUP BY fav_fruit;
 fav_fruit | count 
-----------+-------
 APPLES    |     1
 apple     |     1
 apple     |     3
  banana   |     1
 BANANA    |     2
  apple    |     1
 APPLE     |     1
 bannana   |     1
 banana    |     1
 Apple     |     1
(10 rows)

Dengan nilai yang sudah dikodekan ulang

SELECT standardized, 
       count(*) 
  FROM fruit
       LEFT JOIN recode
       ON fav_fruit=original
 GROUP BY standardized;
 standardized | count 
--------------+-------
 apple        |     8
 banana       |     5
(2 rows)
Analisis Data Eksploratif di SQL

Ringkasan

  1. CREATE TEMP TABLE dengan nilai asli
  2. UPDATE untuk membuat nilai baku
  3. JOIN data asli ke data baku
Analisis Data Eksploratif di SQL

Bersihkan data Evanston 311!

Analisis Data Eksploratif di SQL

Preparing Video For Download...