Character data types and common issues

Exploratory Data Analysis in SQL

Christina Maimone

Data Scientist

PostgreSQL character types

character(n) or char(n)

  • fixed length n
  • trailing spaces ignored in comparisons

character varying(n) or varchar(n)

  • variable length up to a maximum of n

text or varchar

  • unlimited length
Exploratory Data Analysis in SQL

Types of text data

Categorical

Tues, Tuesday, Mon, TH

shirts, shoes, hats, pants

satisfied, very satisfied, unsatisfied

0349-938, 1254-001, 5477-651

red, blue, green, yellow

Unstructured Text

I really like this product. I use it every day. It's my favorite color.

We've redesigned your favorite t-shirt to make it even better. You'll love...

Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal...

Exploratory Data Analysis in SQL

Grouping and counting

SELECT category,        -- categorical variable

       count(*)         -- count rows for each category

  FROM product          -- table

 GROUP BY category;     -- categorical variable


 category | count 
----------+-------
 Banana   |     1
 Apple    |     4
 apple    |     2
  apple   |     1
 banana   |     3
(5 rows)
Exploratory Data Analysis in SQL

Order: most frequent values

SELECT category,        -- categorical variable

       count(*)         -- count rows for each category

  FROM product          -- table

 GROUP BY category      -- categorical variable

 ORDER BY count DESC;   -- show most frequent values first
 category | count 
----------+-------
 Apple    |     4
 banana   |     3
 apple    |     2
 Banana   |     1
  apple   |     1
(5 rows)
Exploratory Data Analysis in SQL

Order: category value

SELECT category,        -- categorical variable

       count(*)         -- count rows for each category

  FROM product          -- table

 GROUP BY category      -- categorical variable

 ORDER BY category;     -- order by categorical variable
 category | count 
----------+-------
  apple   |     1
 Apple    |     4
 Banana   |     1
 apple    |     2
 banana   |     3
(5 rows)

Exploratory Data Analysis in SQL

Alphabetical order

-- Results

 category | count 
----------+-------
  apple   |     1
 Apple    |     4
 Banana   |     1
 apple    |     2
 banana   |     3
(5 rows)

-- Alphabetical Order:

' ' < 'A' < 'a'
-- From results

' ' < 'A' < 'B' < 'a' < 'b'

Exploratory Data Analysis in SQL

Common issues

Case matters

    'apple' != 'Apple'

 

Spaces count

    ' apple' != 'apple'

    '' != '       '

Empty strings aren't null

    '' != NULL

 

Punctuation differences

    'to-do' != 'to–do'

Exploratory Data Analysis in SQL

Time to examine some text data

Exploratory Data Analysis in SQL

Preparing Video For Download...