Exploratory Data Analysis in SQL
Christina Maimone
Data Scientist
character(n) or char(n)
ncharacter varying(n) or varchar(n)
ntext or varchar
Categorical
Tues, Tuesday, Mon, TH
shirts, shoes, hats, pants
satisfied, very satisfied, unsatisfied
0349-938, 1254-001, 5477-651
red, blue, green, yellow
Unstructured Text
I really like this product. I use it every day. It's my favorite color.
We've redesigned your favorite t-shirt to make it even better. You'll love...
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal...
SELECT category, -- categorical variable
count(*) -- count rows for each category
FROM product -- table
GROUP BY category; -- categorical variable
category | count
----------+-------
Banana | 1
Apple | 4
apple | 2
apple | 1
banana | 3
(5 rows)
SELECT category, -- categorical variable
count(*) -- count rows for each category
FROM product -- table
GROUP BY category -- categorical variable
ORDER BY count DESC; -- show most frequent values first
category | count
----------+-------
Apple | 4
banana | 3
apple | 2
Banana | 1
apple | 1
(5 rows)
SELECT category, -- categorical variable
count(*) -- count rows for each category
FROM product -- table
GROUP BY category -- categorical variable
ORDER BY category; -- order by categorical variable
category | count
----------+-------
apple | 1
Apple | 4
Banana | 1
apple | 2
banana | 3
(5 rows)
-- Results
category | count
----------+-------
apple | 1
Apple | 4
Banana | 1
apple | 2
banana | 3
(5 rows)
-- Alphabetical Order:
' ' < 'A' < 'a'
-- From results
' ' < 'A' < 'B' < 'a' < 'b'
Case matters
'apple' != 'Apple'
Spaces count
' apple' != 'apple'
'' != ' '
Empty strings aren't null
'' != NULL
Punctuation differences
'to-do' != 'to–do'
Exploratory Data Analysis in SQL