Generating hypotheses

Analisi esplorativa dei dati in Python

George Boorman

Curriculum Manager, DataCamp

What do we know?

Countplot showing the number of flights per airline in different price categories, with Jet Airways having the largest number of First Class tickets

Analisi esplorativa dei dati in Python

What do we know?

sns.heatmap(planes.corr(numeric_only=True), annot=True)
plt.show()

Heatmap showing Pearson correlation coefficient scores between variables in the planes dataset

Analisi esplorativa dei dati in Python

Spurious correlation

sns.scatterplot(data=planes, x="Duration", y="Price", hue="Total_Stops")
plt.show()

Scatter plot of Price versus Duration, factoring Total Stops

Analisi esplorativa dei dati in Python

How do we know?

Heatmap with correlation coefficient scores for each number of stops

Analisi esplorativa dei dati in Python

What is true?

Typewriter displaying "Fake News"

  • Would data from a different time give the same results?

  • Detecting relationships, differences, and patterns:

    • We use Hypothesis Testing
  • Hypothesis testing requires, prior to data collection:

    • Generating a hypothesis or question
    • A decision on what statistical test to use
1 Image credit: https://unsplash.com/@markuswinkler
Analisi esplorativa dei dati in Python

Data snooping

 

office with a view looking out on to an airport runway

Magnifying glass looking into a bar chart

Analisi esplorativa dei dati in Python

Generating hypotheses

sns.barplot(data=planes, x="Airline", y="Duration")
plt.show()

Bar plot of duration versus airline

Analisi esplorativa dei dati in Python

Generating hypotheses

sns.barplot(data=planes, x="Destination", y="Price")
plt.show()

Bar plot showing average proce

Analisi esplorativa dei dati in Python

Next steps

  • Design our experiment

  • Involves steps such as:

    • Choosing a sample
    • Calculating how many data points we need
    • Deciding what statistical test to run
Analisi esplorativa dei dati in Python

Let's practice!

Analisi esplorativa dei dati in Python

Preparing Video For Download...