Cleaning Data in SQL Server Databases
Miriam Antona
Software Engineer
airports
| airport_code | airport_name | airport_city | airport_state |
|--------------|-------------------------------------------|-------------------|---------------|
| MSP | Minneapolis-St Paul International | Minneapolis | Minnesota |
| JFK | John F. Kennedy International | New York City | New York |
| LAX | Los Angeles International | Los Angeles | California |
| DFW | Dallas/Fort Worth International | Dallas/Fort Worth | Texas |
| BOS | Logan International | Boston | Massachusetts |
| SFO | San Francisco International | San Francisco | Californiaa |
| ATL | Hartsfield-Jackson Atlanta International | Atlanta | Georgia |
| ... | ... | ... | ... |
carriers
| code | name |
|------|---------------------------------|
| YV | Mesa Airlines Inc. |
| AA | American Airlines Inc. |
| DL | Delta Air Lines Inc. |
| HA | Hawaiian Airlines Inc. |
| MQ | American Eagle Airlines Inc. |
| EV | ExpressJet Airlines Inc. |
| ... | ... |
flight statistics
| registration_code | airport_code | carrier_code | canceled | on_time | delayed | ... |
|-------------------|--------------|--------------|----------|---------|---------|-----|
| ... | ... | ... | ... | ... | ... | ... |
| 000000119 | JFK | AA | 74 | 819 | 233 | ... |
| 120 | JFK | B6 | 438 | 1865 | 1010 | ... |
| 000000121 | JFK | HA | 0 | 25 | 3 | ... |
| 122 | JFK | MQ | 102 | 386 | 159 | ... |
| 000000124 | JFK | UA | 22 | 296 | 88 | ... |
| 000000125 | JFK | US | 15 | 191 | 63 | ... |
| 000000126 | JFK | VX | 12 | 225 | 61 | ... |
| ... | ... | ... | ... | ... | ... | ... |
pilots
| pilot_code | pilot_name | pilot_surname | carrier_code | entry_date |
|------------|------------|---------------|--------------|------------|
| 1 | Thomas | Peters | HA | 2011-10-01 |
| 2 | Hiroki | Konoe | MQ | 2011-01-21 |
| 3 | Arturo | Montero | UA | 2012-12-28 |
| 4 | David | Captain | US | 2000-10-01 |
| 5 | Ainhoa | Guerrera | VX | 2000-10-05 |
| 6 | Alvin | Andersen | OO | 2012-01-15 |
| 7 | William | Champy | F9 | 2011-03-15 |
| ... | ... | ... | ... | ... |
SELECT * FROM flight_statistics
| registration_code | airport_code | carrier_code | canceled | on_time | delayed | ... |
|-------------------|--------------|--------------|----------|---------|---------|-----|
| ... | ... | ... | ... | ... | ... | ... |
| 000000119 | JFK | AA | 74 | 819 | 233 | ... |
| 120 | JFK | B6 | 438 | 1865 | 1010 | ... |
| 000000121 | JFK | HA | 0 | 25 | 3 | ... |
| 122 | JFK | MQ | 102 | 386 | 159 | ... |
| 000000124 | JFK | UA | 22 | 296 | 88 | ... |
| 000000125 | JFK | US | 15 | 191 | 63 | ... |
| 000000126 | JFK | VX | 12 | 225 | 61 | ... |
| ... | ... | ... | ... | ... | ... | ... |
VALID: 000000128 - until 9 digits
INVALID: 128
REPLICATE (string, integer)
Repeats a string a specified number of times.
REPLICATE (string, integer)
Repeats a string a specified number of times.
REPLICATE('0', 9 - LEN(registration_code))
REPLICATE (string, integer)
Repeats a string a specified number of times.
REPLICATE('0', 9 - LEN(registration_code))
-- registration_code: 120 => LEN(120) = 3
REPLICATE('0', 6)
+
operator
SELECT
REPLICATE('0', 9 - LEN(registration_code)) + registration_code AS registration_code
FROM flight_statistics
CONCAT
- since SQL Server 2012
SELECT
CONCAT(REPLICATE('0', 9 - LEN(registration_code)), registration_code) AS registration_code
FROM flight_statistics
| registration_code |
|-------------------|
| ... |
| 000000119 |
| 000000120 |
| 000000121 |
| 000000122 |
| 000000123 |
| 000000124 |
| 000000125 |
| 000000126 |
| ... |
FORMAT (value, format [, culture ] )
SELECT
FORMAT(CAST(registration_code AS INT), '000000000') AS registration_code
FROM flight_statistics;
| registration_code |
|-------------------|
| ... |
| 000000119 |
| 000000120 |
| 000000121 |
| 000000122 |
| 000000123 |
| 000000124 |
| 000000125 |
| 000000126 |
| ... |
Cleaning Data in SQL Server Databases