La metrica più semplice

Analisi del carrello in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Metriche e pruning

  • Una metrica misura le prestazioni delle regole.
    • {humor} $\rightarrow$ {poetry}
      • 0.81
    • {fiction} $\rightarrow$ {travel}
      • 0.23
  • Il pruning usa le metriche per scartare regole.
    • Tieni: {humor} $\rightarrow$ {poetry}
    • Scarta: {fiction} $\rightarrow$ {travel}
Analisi del carrello in Python

La metrica più semplice

  • La metrica supporto misura la quota di transazioni che contengono un itemset.

 

$$\frac{\text{numero di transazioni con item}}{\text{numero di transazioni}}$$

 

$$\frac{\text{numero di transazioni con milk}}{\text{transazioni totali}}$$

Analisi del carrello in Python

Supporto per language

TID Transazione
0 travel, humor, fiction
1 humor, language
2 humor, biography, cooking
3 cooking, language
4 travel

 

Supporto per {language} = 2 / 10 = 0.2

TID Transazione
5 poetry, health, travel, history
6 humor
7 travel
8 poetry, fiction, humor
9 fiction, biography
Analisi del carrello in Python

Supporto per {Humor} $\rightarrow$ {Language}

TID Transazione
0 travel,humor,fiction
1 humor,language
2 humor,biography,cooking
3 cooking,language
4 travel

 

SUPPORTO per {language} $\rightarrow$ {humor} = 0.1

TID Transazione
5 poetry,health,travel,history
6 humor
7 travel
8 poetry,fiction,humor
9 fiction,biography
Analisi del carrello in Python

Preparare i dati

print(transactions)
[['travel', 'humor', 'fiction'],
...
['fiction', 'biography']]
from mlxtend.preprocessing import TransactionEncoder
# Instantiate transaction encoder
encoder = TransactionEncoder().fit(transactions)
Analisi del carrello in Python

Preparare i dati

# One-hot encode itemsets by applying fit and transform
onehot = encoder.transform(transactions)
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(onehot, columns = encoder.columns_)
print(onehot)
   biography  cooking  ...  poetry  travel
0  False      False   ...   False    True
...
9  True       False   ...   False    False
Analisi del carrello in Python

Calcolare il supporto per singoli item

print(onehot.mean())
biography    0.2
cooking      0.2
fiction      0.3
health       0.1
history      0.1
humor        0.5
language     0.2
poetry       0.2
travel       0.4
dtype: float64
Analisi del carrello in Python

Calcolare il supporto per più item

import numpy as np

# Define itemset that contains fiction and poetry
onehot['fiction+poetry'] = np.logical_and(onehot['fiction'],onehot['poetry'])

print(onehot.mean())
biography         0.2
cooking           0.2
...               ...
travel            0.4
fiction+poetry    0.1
dtype: float64
Analisi del carrello in Python

¡Vamos a practicar!

Analisi del carrello in Python

Preparing Video For Download...