Exploratory Data Analysis in SQL
Christina Maimone
Data Scientist
character(n) of char(n)
ncharacter varying(n) of varchar(n)
ntext of varchar
Categorisch
Tues, Tuesday, Mon, TH
shirts, shoes, hats, pants
satisfied, very satisfied, unsatisfied
0349-938, 1254-001, 5477-651
red, blue, green, yellow
Ongestructureerde tekst
I really like this product. I use it every day. It's my favorite color.
We've redesigned your favorite t-shirt to make it even better. You'll love...
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal...
SELECT category, -- categorische variabele
count(*) -- aantal rijen per categorie
FROM product -- tabel
GROUP BY category; -- categorische variabele
category | count
----------+-------
Banana | 1
Apple | 4
apple | 2
apple | 1
banana | 3
(5 rows)
SELECT category, -- categorische variabele
count(*) -- aantal rijen per categorie
FROM product -- tabel
GROUP BY category -- categorische variabele
ORDER BY count DESC; -- meest voorkomende waarden eerst
category | count
----------+-------
Apple | 4
banana | 3
apple | 2
Banana | 1
apple | 1
(5 rows)
SELECT category, -- categorische variabele
count(*) -- aantal rijen per categorie
FROM product -- tabel
GROUP BY category -- categorische variabele
ORDER BY category; -- sorteer op categorische variabele
category | count
----------+-------
apple | 1
Apple | 4
Banana | 1
apple | 2
banana | 3
(5 rows)
-- Resultaten
category | count
----------+-------
apple | 1
Apple | 4
Banana | 1
apple | 2
banana | 3
(5 rows)
-- Alfabetische volgorde:
' ' < 'A' < 'a'
-- Uit de resultaten
' ' < 'A' < 'B' < 'a' < 'b'
Hoofd-/kleine letters tellen
'apple' != 'Apple'
Spaties tellen
' apple' != 'apple'
'' != ' '
Lege strings zijn niet null
'' != NULL
Interpunctie verschilt
'to-do' != 'to–do'
Exploratory Data Analysis in SQL