- mlxtend
Machine Learning Extensions, a Python library of useful tools for the day-to-day data science tasks.
- TransactionEncoder
We can transform this dataset into an array format suitable for typical machine learning APIs. The NumPy array is boolean for the sake of memory efficiency when working with large datasets.
fit : the TransactionEncoder learns the unique labels in the dataset
transform : it transforms the input dataset(list) into a one-hot encoded NumPy boolean array
sparse : bool, default=False, if True, transform will return Compressed Sparse Row matrix instead of the regular one
columns_ : After fitting, the unique column names that correspond to the data array can be accessed via the columns_ attribute.
from mlxtend.preprocessing import TransactionEncoder
te=TransactionEncoder()
te.fit(dataset).transform(dataset)
>>>
array([[False, False, False, True, False, True, True, True, True,
False, True],
[False, False, True, True, False, True, False, True, True,
False, True],
[ True, False, False, True, False, True, True, False, False,
False, False],
[False, True, False, False, False, True, True, False, False,
True, True],
[False, True, False, True, True, True, False, False, True,
False, False]])
df=pd.DataFrame(te_array,columns=te.columns_)
df
>>>
Apple Corn Dill Eggs Ice cream Kidney Beans Milk Nutmeg Onion Unicorn Yogurt
0 False False False True False True True True True False True
1 False False True True False True False True True False True
2 True False False True False True True False False False False
3 False True False False False True True False False True True
4 False True False True True True False False True False False
- association_rules
association_rules(df, metric=, min_threshold=, support_only=)
An implication expression of the form X→Y, where X and Y are disjoint itemsets. A more concrete example based on consumer behaviour would be {Diapers}→{Beer} suggesting that people who buy diapers are also likely to buy beer.
metric : To evaluate, default='confidence' but if support_only=True, it automatically set to 'support'. Otherwise, supported metrics are 'support', 'confidence', 'lift', 'leverage', and 'conviction'.
min_threshold : Minimal threshold, float, for the evaluation metric.
support_only : Only computes the rule support and fills the other metric columns with NaNs, default is False.
leverage : It computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. An leverage value of 0 indicates independence. range: [−1,1]
conviction : A high conviction value means that the consequent is highly depending on the antecedent.
antecedents consequents antecedent support consequent support support confidence lift leverage conviction
0 (Onion) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf
1 (Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.6
2 (Onion, Kidney Beans) (Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf
3 (Kidney Beans, Eggs) (Onion) 0.8 0.6 0.6 0.75 1.25 0.12 1.6
4 (Onion) (Kidney Beans, Eggs) 0.6 0.8 0.6 1.00 1.25 0.12 inf
5 (Eggs) (Onion, Kidney Beans) 0.8 0.6 0.6 0.75 1.25 0.12 1.6
'Analyze Data > Python Libraries' 카테고리의 다른 글
collections-Counter, most_common, FreqDist, defaultdict (0) | 2022.03.04 |
---|---|
pandas-5. json_normalize (0) | 2021.10.25 |
pandas-4. read_csv, unique, to_csv, file upload, file download (0) | 2021.06.22 |
numpy-array, arange, reshape, slicing, newaxis, ...(Ellipsis) (0) | 2021.05.25 |
pandas-2. DataFrame (0) | 2021.05.25 |