我想使用关联规则来分析我的网上商店中的客户数据。以下是我采取的步骤:
首先:我的数据帧raw_data有三列"id_customer","id_product","product_quantity“,它包含700,000行。
第二:我重新排序我的dataframe,我得到了一个有68万行和366列的dataframe:
customer = (
raw_data.groupby(["id_customer", "product_id"])["product_quantity"]
.sum()
.unstack()
.reset_index()
.fillna(0)
.set_index("id_customer")
)
customer[customer != 0] = 1最后:我想创建一个项目的频率:
from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(customer, min_support=0.00001, use_colnames=True)但是现在我得到了一个错误MemoryError: Unable to allocate 686. GiB for an array with shape (66795, 2, 689587) and data type float64
如何修复它?或者如何在不使用apriori函数的情况下计算frequent_itemsets?
发布于 2021-10-11 06:55:55
如果您的数据太大,内存无法容纳,则可以传递一个返回generator而不是列表的函数。
from efficient_apriori import apriori as ap
def data_generator(df):
"""
Data generator, needs to return a generator to be called several times.
Use this approach if data is too large to fit in memory.
"""
def data_gen():
yield [tuple(row) for row in df.values.tolist()]
return data_gen
transactions = data_generator(df)
itemsets, rules = ap(transactions, min_support=0.9, min_confidence=0.6)https://stackoverflow.com/questions/69521918
复制相似问题