Analyze Data/Python Libraries

mlxtend-TransactionEncoder, association_rules

Naranjito 2021. 6. 23. 16:28
  • mlxtend

Machine Learning Extensions, a Python library of useful tools for the day-to-day data science tasks.

 

- TransactionEncoder

We can transform this dataset into an array format suitable for typical machine learning APIs. The NumPy array is boolean for the sake of memory efficiency when working with large datasets.

 

fit : the TransactionEncoder learns the unique labels in the dataset

 

transform : it transforms the input dataset(list) into a one-hot encoded NumPy boolean array

 

sparse : bool, default=False, if True, transform will return Compressed Sparse Row matrix instead of the regular one

 

columns_ : After fitting, the unique column names that correspond to the data array can be accessed via the columns_ attribute.

from mlxtend.preprocessing import TransactionEncoder

te=TransactionEncoder()
te.fit(dataset).transform(dataset)
>>>
array([[False, False, False,  True, False,  True,  True,  True,  True,
        False,  True],
       [False, False,  True,  True, False,  True, False,  True,  True,
        False,  True],
       [ True, False, False,  True, False,  True,  True, False, False,
        False, False],
       [False,  True, False, False, False,  True,  True, False, False,
         True,  True],
       [False,  True, False,  True,  True,  True, False, False,  True,
        False, False]])
        
df=pd.DataFrame(te_array,columns=te.columns_)
df
>>>
Apple	Corn	Dill	Eggs	Ice cream	Kidney Beans	Milk	Nutmeg	Onion	Unicorn	Yogurt
0	False	False	False	True	False	True	True	True	True	False	True
1	False	False	True	True	False	True	False	True	True	False	True
2	True	False	False	True	False	True	True	False	False	False	False
3	False	True	False	False	False	True	True	False	False	True	True
4	False	True	False	True	True	True	False	False	True	False	False

 

association_rules

association_rules(df, metric=, min_threshold=, support_only=)

An implication expression of the form XY, where X and Y are disjoint itemsets. A more concrete example based on consumer behaviour would be {Diapers}{Beer} suggesting that people who buy diapers are also likely to buy beer. 

 

metric : To evaluate, default='confidence' but if support_only=True, it automatically set to 'support'. Otherwise, supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction'.

 

min_threshold : Minimal threshold, float, for the evaluation metric.

 

support_only : Only computes the rule support and fills the other metric columns with NaNs, default is False.

 

leverage : It computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. An leverage value of 0 indicates independence. range: [1,1]

 

conviction : A high conviction value means that the consequent is highly depending on the antecedent. 

	antecedents	consequents	antecedent support	consequent support	support	confidence	lift	leverage	conviction
0	(Onion)	(Eggs)	0.6	0.8	0.6	1.00	1.25	0.12	inf
1	(Eggs)	(Onion)	0.8	0.6	0.6	0.75	1.25	0.12	1.6
2	(Onion, Kidney Beans)	(Eggs)	0.6	0.8	0.6	1.00	1.25	0.12	inf
3	(Kidney Beans, Eggs)	(Onion)	0.8	0.6	0.6	0.75	1.25	0.12	1.6
4	(Onion)	(Kidney Beans, Eggs)	0.6	0.8	0.6	1.00	1.25	0.12	inf
5	(Eggs)	(Onion, Kidney Beans)	0.8	0.6	0.6	0.75	1.25	0.12	1.6