Visualizing your data

Data Manipulation with pandas

Maggie Matsui

Senior Content Developer at DataCamp

Histograms

import matplotlib.pyplot as plt
dog_pack["height_cm"].hist()
plt.show()

A histogram of dog heights. The shortest dogs are under 20cm, and the tallest dogs are 70cm. The most popular dog height is between 50 and 60 cm.

Data Manipulation with pandas

Histograms

dog_pack["height_cm"].hist(bins=20)
plt.show()

The same histogram of dog heights that was shown in the previous slide, but now with 20 narrow bins.

dog_pack["height_cm"].hist(bins=5)
plt.show()

The same histogram of dog heights that was shown in the previous slide, but now with five wide bins.

Data Manipulation with pandas

Bar plots

avg_weight_by_breed = dog_pack.groupby("breed")["weight_kg"].mean()
print(avg_weight_by_breed)
breed
Beagle         10.636364
Boxer          30.620000
Chihuahua       1.491667
Chow Chow      22.535714
Dachshund       9.975000
Labrador       31.850000
Poodle         20.400000
St. Bernard    71.576923
Name: weight_kg, dtype: float64
Data Manipulation with pandas

Bar plots

avg_weight_by_breed.plot(kind="bar")

plt.show()

A bar plot of the average weights of dogs, in kilograms, split by breed. St. Bernard dogs are the heaviest, while chihuahuas are the lightest.

avg_weight_by_breed.plot(kind="bar",
    title="Mean Weight by Dog Breed")
plt.show()

The same bar plot as on the left of the screen, but with an additional title reading "Mean Weight by Dog Breed."

Data Manipulation with pandas

Line plots

sully.head()
          date    weight_kg
0   2019-01-31         36.1
1   2019-02-28         35.3
2   2019-03-31         32.0
3   2019-04-30         32.9
4   2019-05-31         32.0
sully.plot(x="date", 
           y="weight_kg", 
           kind="line")
plt.show()

A line plot of the weight of a dog named Sully over time. The weight fluctuates between 27 and 36 kilograms.

Data Manipulation with pandas

Rotating axis labels

sully.plot(x="date", y="weight_kg", kind="line", rot=45)
plt.show()

The same line plot of Sully's weight that was seen in the previous slide, but with the text on the x-axis rotated forty-five degrees clockwise.

Data Manipulation with pandas

Scatter plots

dog_pack.plot(x="height_cm", y="weight_kg", kind="scatter")
plt.show()

A scatter plot of dog weights vs. dog heights. As dog heights increase, so do dog weights. There are some clusters. I wonder if those correspond to breeds.

Data Manipulation with pandas

Layering plots

dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist()
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist()

plt.show()

Two histograms of dog heights are shown in the same plot. One is blue, and one is orange. The orange histogram covers the blue histogram, making it difficult to see what is happening.

Data Manipulation with pandas

Add a legend

dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist()
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist()
plt.legend(["F", "M"])
plt.show()

The same plot with two histograms as in the previous slide, but with a legend. "F" for female is marked as blue, and "M" for male is marked as orange. It's still difficult to see what is happening.

Data Manipulation with pandas

Transparency

dog_pack[dog_pack["sex"]=="F"]["height_cm"].hist(alpha=0.7)
dog_pack[dog_pack["sex"]=="M"]["height_cm"].hist(alpha=0.7)
plt.legend(["F", "M"])
plt.show()

The same plot with two histograms as in the previous slide, but now the histograms are transparent. This makes it possible to see the bars in the female histogram that were obscured by the bars in the male histogram. It's still a little bit ugly. You should take the Seaborn courses because the plots are prettier.

Data Manipulation with pandas

Avocados

print(avocados)
            date          type  year  avg_price         size     nb_sold
0     2015-12-27  conventional  2015       0.95        small  9626901.09
1     2015-12-20  conventional  2015       0.98        small  8710021.76
2     2015-12-13  conventional  2015       0.93        small  9855053.66
...          ...           ...   ...        ...          ...         ...
1011  2018-01-21       organic  2018       1.63  extra_large     1490.02
1012  2018-01-14       organic  2018       1.59  extra_large     1580.01
1013  2018-01-07       organic  2018       1.51  extra_large     1289.07

[1014 rows x 6 columns]
Data Manipulation with pandas

Let's practice!

Data Manipulation with pandas

Preparing Video For Download...