Selecting columns

Introduction to Data Science in Python

Hillary Green-Lerman

Lead Data Scientist, Looker

Why select columns?

  • Use in a calculation

    credit_records.price.sum()
    
  • Plot data

    plt.plot(ransom['letter'], ransom['frequency'])
    
Introduction to Data Science in Python

Columns names are strings

print(credit_records.head())
            suspect         location              date         item  price
0    Kirstine Smith   Groceries R Us   January 6, 2018     broccoli   1.25
1      Gertrude Cox  Petroleum Plaza   January 6, 2018  fizzy drink   1.90
2  Fred Frequentist   Groceries R Us   January 6, 2018     broccoli   1.25
3      Gertrude Cox   Groceries R Us  January 12, 2018     broccoli   1.25
4    Kirstine Smith    Clothing Club   January 9, 2018        shirt  14.25
'suspect'
'location'
'date'
'item'
'price'
Introduction to Data Science in Python

Selecting with brackets and string

suspect = credit_records['suspect']
print(suspect)
0            Kirstine Smith
1              Gertrude Cox
2          Fred Frequentist
3              Gertrude Cox
4            Kirstine Smith
5              Gertrude Cox
...
99             Gertrude Cox
100        Fred Frequentist
101            Gertrude Cox
102          Kirstine Smith
103    Ronald Aylmer Fisher
Introduction to Data Science in Python

Selecting with a dot

price = credit_records.price
print(price)
0       1.25
1       1.90
2       1.25
3       1.25
4      14.25
5       3.95
...
99     14.25
100    12.05
101    20.15
102     3.95
103     2.05
Introduction to Data Science in Python

Common mistakes in column selection

Use brackets and string for column names with spaces or special characters (-, ?, etc.)

police_report['Is Golden Retriever?']

NOT

police_report.Is Golden Retriever?
Object `Retriever` not found.
Introduction to Data Science in Python

Common mistakes in column selection

When using brackets and string, don't forget the quotes around the column name!

credit_report['location']

NOT

credit_report[location]
Object `location` not found.
Introduction to Data Science in Python

Common mistakes in column selection

Brackets, not parentheses

credit_report['location']

NOT

credit_report('location')
----------------------------------------------------------------------
TypeError  Traceback (most recent call last)
<ipython-input-5-aabdb8981438> in <module>()
----> 1 credit_report('location')

TypeError: 'DataFrame' object is not callable
Introduction to Data Science in Python

Let's practice!

Introduction to Data Science in Python

Preparing Video For Download...