Exploratory Data Analysis in Python
Izzy Weber
Curriculum Manager, DataCamp
books.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350 entries, 0 to 349
Data columns (total 5 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 name 350 non-null object
1 author 350 non-null object
2 rating 350 non-null float64
3 year 350 non-null float64
4 genre 350 non-null object
dtypes: float64(1), int64(1), object(3)
memory usage: 13.8+ KB
books.dtypes
name object
author object
rating float64
year float64
genre object
dtype: object
books["year"] = books["year"].astype(int)
books.dtypes
name object
author object
rating float64
year int64
genre object
dtype: object
Type | Python Name |
---|---|
String | str |
Integer | int |
Float | float |
Dictionary | dict |
List | list |
Boolean | bool |
books["genre"].isin(["Fiction", "Non Fiction"])
0 True
1 True
2 True
3 True
4 False
...
345 True
346 True
347 True
348 True
349 False
Name: genre, Length: 350, dtype: bool
~books["genre"].isin(["Fiction", "Non Fiction"])
0 False
1 False
2 False
3 False
4 True
...
345 False
346 False
347 False
348 False
349 True
Name: genre, Length: 350, dtype: bool
books[books["genre"].isin(["Fiction", "Non Fiction"])].head()
| | name | author | rating | year | genre |
|---|-------------------------------|---------------------|--------|------|-------------|
| 0 | 10-Day Green Smoothie Cleanse | JJ Smith | 4.7 | 2016 | Non Fiction |
| 1 | 11/22/63: A Novel | Stephen King | 4.6 | 2011 | Fiction |
| 2 | 12 Rules for Life | Jordan B. Peterson | 4.7 | 2018 | Non Fiction |
| 3 | 1984 (Signet Classics) | George Orwell | 4.7 | 2017 | Fiction |
| 5 | A Dance with Dragons | George R. R. Martin | 4.4 | 2011 | Fiction |
books.select_dtypes("number").head()
| | rating | year |
|---|--------|------|
| 0 | 4.7 | 2016 |
| 1 | 4.6 | 2011 |
| 2 | 4.7 | 2018 |
| 3 | 4.7 | 2017 |
| 4 | 4.8 | 2019 |
books["year"].min()
2009
books["year"].max()
2019
sns.boxplot(data=books, x="year")
plt.show()
sns.boxplot(data=books, x="year", y="genre")
Exploratory Data Analysis in Python