Tekenreekstypen en veelvoorkomende issues

Exploratory Data Analysis in SQL

Christina Maimone

Data Scientist

PostgreSQL-tekenreekstypen

character(n) of char(n)

  • vaste lengte n
  • afsluitende spaties genegeerd bij vergelijkingen

character varying(n) of varchar(n)

  • variabele lengte tot max. n

text of varchar

  • onbeperkte lengte
Exploratory Data Analysis in SQL

Soorten tekstdata

Categorisch

Tues, Tuesday, Mon, TH

shirts, shoes, hats, pants

satisfied, very satisfied, unsatisfied

0349-938, 1254-001, 5477-651

red, blue, green, yellow

Ongestructureerde tekst

I really like this product. I use it every day. It's my favorite color.

We've redesigned your favorite t-shirt to make it even better. You'll love...

Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal...

Exploratory Data Analysis in SQL

Groeperen en tellen

SELECT category,        -- categorische variabele

       count(*)         -- aantal rijen per categorie

  FROM product          -- tabel

 GROUP BY category;     -- categorische variabele


 category | count 
----------+-------
 Banana   |     1
 Apple    |     4
 apple    |     2
  apple   |     1
 banana   |     3
(5 rows)
Exploratory Data Analysis in SQL

Sortering: meest voorkomende waarden

SELECT category,        -- categorische variabele

       count(*)         -- aantal rijen per categorie

  FROM product          -- tabel

 GROUP BY category      -- categorische variabele

 ORDER BY count DESC;   -- meest voorkomende waarden eerst
 category | count 
----------+-------
 Apple    |     4
 banana   |     3
 apple    |     2
 Banana   |     1
  apple   |     1
(5 rows)
Exploratory Data Analysis in SQL

Sortering: categoriewaarde

SELECT category,        -- categorische variabele

       count(*)         -- aantal rijen per categorie

  FROM product          -- tabel

 GROUP BY category      -- categorische variabele

 ORDER BY category;     -- sorteer op categorische variabele
 category | count 
----------+-------
  apple   |     1
 Apple    |     4
 Banana   |     1
 apple    |     2
 banana   |     3
(5 rows)

Exploratory Data Analysis in SQL

Alfabetische volgorde

-- Resultaten

 category | count 
----------+-------
  apple   |     1
 Apple    |     4
 Banana   |     1
 apple    |     2
 banana   |     3
(5 rows)

-- Alfabetische volgorde:

' ' < 'A' < 'a'
-- Uit de resultaten

' ' < 'A' < 'B' < 'a' < 'b'

Exploratory Data Analysis in SQL

Veelvoorkomende issues

Hoofd-/kleine letters tellen

    'apple' != 'Apple'

 

Spaties tellen

    ' apple' != 'apple'

    '' != '       '

Lege strings zijn niet null

    '' != NULL

 

Interpunctie verschilt

    'to-do' != 'to–do'

Exploratory Data Analysis in SQL

Tijd om tekstdata te bekijken

Exploratory Data Analysis in SQL

Preparing Video For Download...