Summarizing data

Introduction to NumPy

Izzy Weber

Core Curriculum Manager, DataCamp

Aggregating methods

 

  • .sum()
  • .min()
  • .max()
  • .mean()
  • .cumsum()
Introduction to NumPy

Our data

 

security_breaches
array([[0, 5, 1],
       [0, 2, 0],
       [1, 1, 2],
       [2, 2, 1],
       [0, 0, 0]])

A graphic of the security_breaches array, with rows labeled as years and columns labeled as clients

Introduction to NumPy

Summing data

A graphic of the security_breaches array, with rows labeled as years and columns labeled as clients

security_breaches.sum()
17
Introduction to NumPy

Aggregating rows

a graphic with each column highlighted and sum symbols at the bottom to indicate that the array is being summed down each column

security_breaches.sum(axis=0)
array([ 3, 10,  4])
Introduction to NumPy

Aggregating columns

a graphic with each row highlighted and sum symbols at the right of each row to indicate that the array is being summed across each row

security_breaches.sum(axis=1)
array([6, 2, 4, 5, 0])
Introduction to NumPy

Making sense of the axis argument

A graphic showing an array and how it looks when it is collapsed into a single column, containing the sum of all elements in each row

Introduction to NumPy

Minimum and maximum values

A graphic of the security_breaches array, with rows labeled as years and columns labeled as clients

security_breaches.min()
0
security_breaches.max()
5
security_breaches.min(axis=1)
array([0, 0, 1, 1, 0])
Introduction to NumPy

Finding the mean

A graphic of the security_breaches array, with rows labeled as years and columns labeled as clients

security_breaches.mean()
1.1333333333333333
security_breaches.mean(axis=1)
array([2., 0.6667, 1.3333, 1.6667, 0.])
Introduction to NumPy

The keepdims argument

security_breaches.sum(axis=1)
array([6, 2, 4, 5, 0])
security_breaches.sum(axis=1, keepdims=True)
array([[6],
       [2],
       [4],
       [5],
       [0]])
Introduction to NumPy

Cumulative sums

A graphic of the security_breaches array, with rows labeled as years and columns labeled as clients

security_breaches.cumsum(axis=0)
array([[ 0,  5,  1],
       [ 0,  7,  1],
       [ 1,  8,  3],
       [ 3, 10,  4],
       [ 3, 10,  4]])
Introduction to NumPy

Graphing summary values

cum_sums_by_client = security_breaches.cumsum(axis=0)
plt.plot(np.arange(1, 6), cum_sums_by_client[:, 0], label="Client 1")
plt.plot(np.arange(1, 6), cum_sums_by_client.mean(axis=1), label="Average")
plt.legend()
plt.show()

Introduction to NumPy

Let's practice!

Introduction to NumPy

Preparing Video For Download...