EDA with categorical variables

Exploratory Data Analysis in Power BI

Maarten Van den Broeck

Content Developer at DataCamp

Categorical variables and frequency

A bar chart with the number of participants, y-axis, for three age groups, x-axis. The age groups are "18-29", "30-39", and "40-49" which has the greater number of participants in the sample data.

Exploratory Data Analysis in Power BI

Categorical variables and percentages

A pie chart showing the percentage of participants within three age groups - "18-29", "30-39", and "40-49". This last group has the highest percentage at 39.4%.

Exploratory Data Analysis in Power BI

Proportions across multiple categorical variables

A 100% stacked bar chart. There are three age groups are on the x-axis: "18-29", "30-39", "40-49". Percentage of participants is on the y-axis. Each bar is broken down into percentage of the group on one of the four social media platforms - Instagram, LinkedIn, TikTok, and Twitter.

Exploratory Data Analysis in Power BI

Categorical variables with descriptive statistics

Age Group Median Hours per Day on Social Media
18-29 6
30-39 3
40-49 3
Exploratory Data Analysis in Power BI

What are boxplots?

A boxplot of heights of people. Heights, in centimeters, are on the y-axis.

Exploratory Data Analysis in Power BI

What are boxplots?

A boxplot of heights of people. Heights, in centimeters, are on the y-axis. A red outline is around the line in the center of the box plot or the median.

Exploratory Data Analysis in Power BI

What are boxplots?

A boxplot of heights of people. Heights, in centimeters, are on the y-axis. A red outline is around the "box" of the box plot.

Exploratory Data Analysis in Power BI

What are boxplots?

A boxplot of heights of people. Heights, in centimeters, are on the y-axis. A red outline is around the vertical whiskers extending from the top and bottom of the box.

Exploratory Data Analysis in Power BI

What are boxplots?

A boxplot of heights of people. Heights, in centimeters, are on the y-axis. A red outline is around the outliers, or dots, on the box plot.

Exploratory Data Analysis in Power BI

Comparing distributions with categorical variables

Two boxplots - one for male and another for female - showing the distribution of heights within each group. Both are of equal size but the box plot for "male" is higher on the y-axis.

Exploratory Data Analysis in Power BI

Creating new variables

Data mutation: creating new variables to refine an analysis or visualization

Exploratory Data Analysis in Power BI

Creating new variables

Data mutation: creating new variables to refine an analysis or visualization

Age Age Group
18 Teen
19 Teen
20 Early Adult
21 Early Adult
30 Adult
31 Adult
40 Middle Age
41 Middle Age

$$ $$ $$

Course Title Course Type
Introduction to Power BI Power BI
Unsupervised Learning in R R
DAX in Power BI Power BI
Introduction to Python Python
Exploratory Data Analysis in Power BI

Let's practice!

Exploratory Data Analysis in Power BI

Preparing Video For Download...