Transforming text in Power Query

Data Preparation in Power BI

Maarten Van den Broeck

Content Developer at DataCamp

What is clean text data?

  • Free from typos
  • Consistently formatted
    • Data points are consistently represented
    • Uniform capitalization
    • No leading or trailing whitespace
    • No punctuation(!) or control characters (\n, \r, etc.)
  • Ideally, each column will only store one piece of information
    • Columns are split or merged to achieve this

Yeelow -> Yellow

A screen shot from Power Query showing a table with two columns, one with dirty data, and one with clean data.

A screen shot from Power Query showing a table with three columns, one with multiple pieces of information, and two columns that are derived from that one column, each representing one piece of information.

Data Preparation in Power BI

How to clean text data?

  • Accessed from the Transform ribbon

A screen shot of the Power Query interface showing the Text transformation options

  • Use the Format option to change the capitalization and access Trim and Clean

  • Trim & Clean should be applied to all text columns

    • Trim - removes trailing and leading whitespace
    • Clean - removes control characters (new line, carriage return, etc.)
Data Preparation in Power BI

Splitting and combining columns

  • Each column in your dataset should represent one piece of information

    • It is sometimes necessary to split or combine several columns
  • When one column contains many pieces of information -> Split

    • Address -> Building, Street, City, Country
  • When two or more columns can be combined to represent one piece of information -> Combine

    • First Name, Last Name -> Full Name

A screen shot of Power Query showing the Split Column transformation and its options

A screen shot of Power Query showing the Merge Columns transformation

Data Preparation in Power BI

Let's practice!

Data Preparation in Power BI

Preparing Video For Download...