Cleaning Data in SQL Server Databases
Miriam Antona
Software Engineer
vendors
| vendor_id | vendor_name | vendor_surname |
|-----------|-------------|----------------|
| 1 | Eric | Mendoza |
| 2 | Wu | Fengmian |
| 3 | Jaime | Furtado |
| 4 | Carol | NULL |
| ... | ... | ... |
clients
| client_id | client_name | client_surname | city | state |
|-----------|-------------|----------------|-----------|----------|
| 1 | Miriam | Antona | Las Vegas | Nevada |
| 2 | Astrid | Harper | Chicago | Illinois |
| 3 | David | Madden | Phoenix | Arizona |
| 4 | Hiroki | Konoe | Orlando | NULL |
| ... | ... | ... | ... | ... |
paper_shop_daily_sales
| product_name | units | year_of_sale | month_of_sale | day_of_sale | vendor_id | client_id |
|--------------|-------|--------------|---------------|-------------|-----------|-----------|
| notebooks | 2 | 2019 | 1 | 1 | 1 | 1 |
| notebooks | 3 | 2019 | 5 | 12 | 1 | 2 |
| notebooks | 1 | 2019 | 8 | 31 | 1 | 3 |
| pencils | 2 | 2019 | 5 | 2 | 2 | 1 |
| pencils | 5 | 2019 | 6 | 7 | 2 | 2 |
| pencils | 1 | 2019 | 9 | 11 | 3 | 3 |
| crayons | 1 | 2019 | 4 | 15 | 1 | 1 |
| ... | ... | ... | ... | ... | ... | ... |
paper_shop_monthly_sales
| product_name | units | year_of_sale | month_of_sale |
|----------------|-------|--------------|---------------|
| notebooks-150 | 2018 | 1 | 1 |
| notebooks-200 | 2019 | 1 | 2 |
| notebooks-30 | 2019 | 2 | 3 |
| pencils-100 | 2018 | 1 | 1 |
| pencils-50 | 2018 | 2 | 2 |
| pencils-130 | 2019 | 1 | 3 |
| ... | ... | ... | ... |
| vendor_id | vendor_name | vendor_surname |
|-----------|-------------|----------------|
| 1 | Eric | Mendoza |
| 2 | Wu | Fengmian |
| 3 | Jaime | Furtado |
| 4 | Carol | NULL |
| ... | ... | ... |
| vendor_id | full_name |
|-----------|---------------|
| 1 | Eric Mendoza |
| 2 | Wu Fengmian |
| 3 | Jaime Furtado |
| 4 | Carol |
| ... | ... |
CONCAT
SELECT vendor_name,
vendor_surname,
CONCAT(vendor_name, ' ' , vendor_surname) AS full_name
FROM vendors
| vendor_name | vendor_surname | full_name |
|-------------|----------------|---------------|
| Eric | Mendoza | Eric Mendoza |
| Wu | Fengmian | Wu Fengmian |
| Jaime | Furtado | Jaime Furtado |
| Carol | NULL | Carol |
| ... | ... | ... |
CONCAT
ignores the NULL
value+
operator
SELECT vendor_name,
vendor_surname,
vendor_name + ' ' + vendor_surname AS full_name
FROM vendors
| vendor_name | vendor_surname | full_name |
|-------------|----------------|---------------|
| Eric | Mendoza | Eric Mendoza |
| Wu | Fengmian | Wu Fengmian |
| Jaime | Furtado | Jaime Furtado |
| Carol | NULL | NULL |
| ... | ... | ... |
SELECT vendor_name,
vendor_surname,
vendor_name + ISNULL(' ' + vendor_surname, '') AS full_name
FROM vendors
| vendor_name | vendor_surname | full_name |
|-------------|----------------|---------------|
| Eric | Mendoza | Eric Mendoza |
| Wu | Fengmian | Wu Fengmian |
| Jaime | Furtado | Jaime Furtado |
| Carol | NULL | Carol |
| ... | ... | ... |
paper_shop_daily_sales
| product_name | units | year_of_sale | month_of_sale | day_of_sale | ... |
|--------------|-------|--------------|---------------|-------------|-----|
| notebooks | 2 | 2019 | 1 | 1 | ... |
| notebooks | 3 | 2019 | 5 | 12 | ... |
| notebooks | 1 | 2019 | 8 | 31 | ... |
| pencils | 2 | 2019 | 5 | 2 | ... |
| pencils | 5 | 2019 | 6 | 7 | ... |
| crayons | 2 | 2019 | 10 | NULL | ... |
| ... | ... | ... | ... | ... | ... |
DATEFROMPARTS
-> since SQL Server 2012
SELECT
product_name,
units,
DATEFROMPARTS(
year_of_sale,
month_of_sale,
day_of_sale) AS complete_date
FROM paper_shop_daily_sales
| product_name | units | complete_date |
|--------------|-------|---------------|
| notebooks | 2 | 2019-01-01 |
| notebooks | 3 | 2019-05-12 |
| notebooks | 1 | 2019-08-31 |
| pencils | 2 | 2019-05-02 |
| pencils | 5 | 2019-06-07 |
| crayons | 2 | NULL |
| ... | ... | ... |
Cleaning Data in SQL Server Databases