Cleaning messy strings

Cleaning Data in SQL Server Databases

Miriam Antona

Software Engineer

Removing additional spaces

SELECT * FROM carriers
| code | name                            |
|------|---------------------------------|
| YV   | Mesa Airlines Inc.              |
| AA   |   American Airlines Inc.        |
| B6   |   JetBlue Airways               |
| DL   |   Delta Air Lines Inc.          |
| HA   |   Hawaiian Airlines Inc.        |
| MQ   |   American Eagle Airlines Inc.  |
| EV   |  ExpressJet Airlines Inc.       |
| UA   |  United Air Lines Inc.          |
| US   |  US Airways Inc.                |
| ...  | ...                             |
Cleaning Data in SQL Server Databases

Removing additional spaces - TRIM

TRIM ( [characters ] string )
  • Available since SQL Server 2017
  • Removes any specified character from the start and end of a string
  • Removes space character if we don't specify any character.
SELECT TRIM('   JetBlue Airways    ');
JetBlue Airways
Cleaning Data in SQL Server Databases

Removing additional spaces - RTRIM and LTRIM

  • For older versions than SQL Server 2017 -> RTRIM and LTRIM.
-- Removes all trailing spaces
RTRIM ( character_expression )
-- Removes all leading spaces
LTRIM ( character_expression )
SELECT LTRIM(RTRIM('   JetBlue Airways    '));
JetBlue Airways
Cleaning Data in SQL Server Databases

Removing additional spaces

SELECT code, TRIM(name) AS name FROM carriers
SELECT code, LTRIM(RTRIM(name)) AS name FROM carriers
| code | name                         |
|------|------------------------------|
| YV   | Mesa Airlines Inc.           |
| AA   | American Airlines Inc.       |
| B6   | JetBlue Airways              |
| DL   | Delta Air Lines Inc.         |
| HA   | Hawaiian Airlines Inc.       |
| US   | US Airways Inc.              |
| ...  | ...                          |
Cleaning Data in SQL Server Databases

Unifying strings

SELECT * FROM airports 
ORDER BY airport_state
| airport_code | airport_name                            | airport_city    | airport_state |
|--------------|-----------------------------------------|-----------------|---------------|
| ...          | ...                                     | ...             | ...           |
| MIA          | Miami International                     | Miami           | fl            |
| TPA          | Tampa International                     | Tampa           | Fl            |
| FLL          | Fort Lauderdale-Hollywood International | Fort Lauderdale | FL            |
| MCO          | Orlando International                   | Orlando         | Florida       |
| ...          | ...                                     | ...             | ...           |
Cleaning Data in SQL Server Databases

Unifying strings - REPLACE

"Fl" / "fl" / "Florida" -> "Florida"

REPLACE ( string_to_replace , occurrences , string_replacement )
  • Replaces all occurrences of a specified string with another string

  • Performs comparisons based on the collation of the input

    • Use COLLATE to perform a comparison in a specific collation
    • Suppose the collation of our inputs is case insensitive
Cleaning Data in SQL Server Databases

Unifying strings - REPLACE

SELECT 
    airport_code, airport_name, airport_city, 
    REPLACE(airport_state, 'FL', 'Florida') AS airport_state
FROM airports
ORDER BY airport_state
| airport_code | airport_name                            | airport_city    | airport_state |
|--------------|-----------------------------------------|-----------------|---------------|
| ...          | ...                                     | ...             | ...           |
| MIA          | Miami International                     | Miami           | Florida       |
| TPA          | Tampa International                     | Tampa           | Florida       |
| FLL          | Fort Lauderdale-Hollywood International | Fort Lauderdale | Florida       |
| MCO          | Orlando International                   | Orlando         | Floridaorida  |
| ...          | ...                                     | ...             | ...           |
Cleaning Data in SQL Server Databases

Unifying strings - REPLACE

SELECT 
    airport_code, airport_name, airport_city, 
    REPLACE
        (REPLACE(airport_state, 'FL', 'Florida'),
         'Floridaorida', 'Florida') AS airport_state
FROM airports
ORDER BY airport_state
| airport_code | airport_name                            | airport_city    | airport_state |
|--------------|-----------------------------------------|-----------------|---------------|
| MCO          | Orlando International                   | Orlando         | Florida       |
| TPA          | Tampa International                     | Tampa           | Florida       |
| FLL          | Fort Lauderdale-Hollywood International | Fort Lauderdale | Florida       |
| MIA          | Miami International                     | Miami           | Florida       |
| ...          | ...                                     | ...             | ...           |
Cleaning Data in SQL Server Databases

Unifying strings - REPLACE + CASE

SELECT airport_code, airport_name, airport_city, 
    CASE
        WHEN airport_state <> 'Florida' THEN REPLACE(airport_state, 'FL', 'Florida')
        ELSE airport_state 
    END AS airport_state
FROM airports
ORDER BY airport_state
| airport_code | airport_name                            | airport_city    | airport_state |
|--------------|-----------------------------------------|-----------------|---------------|
| MCO          | Orlando International                   | Orlando         | Florida       |
| TPA          | Tampa International                     | Tampa           | Florida       |
| FLL          | Fort Lauderdale-Hollywood International | Fort Lauderdale | Florida       |
| MIA          | Miami International                     | Miami           | Florida       |
| ...          | ...                                     | ...             | ...           |
Cleaning Data in SQL Server Databases

Unifying strings - REPLACE + UPPER

"Fl" / "fl" / "Florida" -> "FL"

SELECT 
    airport_code, airport_name, airport_city, 
    REPLACE(airport_state, 'Florida', 'FL') AS airport_state
FROM airports
ORDER BY airport_state
| airport_code | airport_name                            | airport_city    | airport_state |
|--------------|-----------------------------------------|-----------------|---------------|
| MCO          | Orlando International                   | Orlando         | FL            |
| TPA          | Tampa International                     | Tampa           | Fl            |
| FLL          | Fort Lauderdale-Hollywood International | Fort Lauderdale | FL            |
| MIA          | Miami International                     | Miami           | fl            |
| ...          | ...                                     | ...             | ...           |
Cleaning Data in SQL Server Databases

Unifying strings - REPLACE + UPPER

UPPER ( character_expression )
  • Converts a given string to uppercase.
SELECT 
    airport_code, airport_name, airport_city, 
        UPPER(
              REPLACE(airport_state, 'Florida', 'FL')
        ) AS airport_state
FROM airports
ORDER BY airport_state
Cleaning Data in SQL Server Databases

Unifying strings - REPLACE + UPPER

| airport_code | airport_name                            | airport_city    | airport_state |
|--------------|-----------------------------------------|-----------------|---------------|
| ...          | ...                                     | ...             | ...           |
| MCO          | Orlando International                   | Orlando         | FL            |
| TPA          | Tampa International                     | Tampa           | FL            |
| FLL          | Fort Lauderdale-Hollywood International | Fort Lauderdale | FL            |
| MIA          | Miami International                     | Miami           | FL            |
| ...          | ...                                     | ...             | ...           |
Cleaning Data in SQL Server Databases

Let's practice!

Cleaning Data in SQL Server Databases

Preparing Video For Download...