Exploratory Data Analysis in SQL
Christina Maimone
Data Scientist
character(n)
or char(n)
n
character varying(n)
or varchar(n)
n
text
or varchar
Categorical
Tues, Tuesday, Mon, TH
shirts, shoes, hats, pants
satisfied, very satisfied, unsatisfied
0349-938, 1254-001, 5477-651
red, blue, green, yellow
Unstructured Text
I really like this product. I use it every day. It's my favorite color.
We've redesigned your favorite t-shirt to make it even better. You'll love...
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal...
SELECT category, -- categorical variable
count(*) -- count rows for each category
FROM product -- table
GROUP BY category; -- categorical variable
category | count
----------+-------
Banana | 1
Apple | 4
apple | 2
apple | 1
banana | 3
(5 rows)
SELECT category, -- categorical variable
count(*) -- count rows for each category
FROM product -- table
GROUP BY category -- categorical variable
ORDER BY count DESC; -- show most frequent values first
category | count
----------+-------
Apple | 4
banana | 3
apple | 2
Banana | 1
apple | 1
(5 rows)
SELECT category, -- categorical variable
count(*) -- count rows for each category
FROM product -- table
GROUP BY category -- categorical variable
ORDER BY category; -- order by categorical variable
category | count
----------+-------
apple | 1
Apple | 4
Banana | 1
apple | 2
banana | 3
(5 rows)
-- Results
category | count
----------+-------
apple | 1
Apple | 4
Banana | 1
apple | 2
banana | 3
(5 rows)
-- Alphabetical Order:
' ' < 'A' < 'a'
-- From results
' ' < 'A' < 'B' < 'a' < 'b'
Case matters
'apple' != 'Apple'
Spaces count
' apple' != 'apple'
'' != '
'
Empty strings aren't null
'' != NULL
Punctuation differences
'to-do' != 'to–do'
Exploratory Data Analysis in SQL