dr:使用Yelp创建一个推荐系统,但是遇到测试交互矩阵和训练交互矩阵共享68个交互。这将导致不正确的评估,请在运行以下代码时检查数据拆分. LightFM错误。
test_auc = auc_score(model,
test,
#train_interactions=train, #Unable to run with this line uncommented
item_features=sparse_features_matrix,
num_threads=NUM_THREADS).mean()
print('Hybrid test set AUC: %s' % test_auc)完整故事:使用Yelp数据集构建推荐系统。
离开示例文档(https://making.lyst.com/lightfm/docs/examples/hybrid_crossvalidated.html)中为混合协作过滤提供的代码。
我按照以下方式运行代码:
from sklearn.model_selection import train_test_split
from lightfm import LightFM
from scipy import sparse
from lightfm.evaluation import auc_score
train, test = train_test_split(sparse_Rating_Matrix, test_size=0.25,random_state=4)
# Set the number of threads; you can increase this
# if you have more physical cores available.
NUM_THREADS = 2
NUM_COMPONENTS = 100
NUM_EPOCHS = 3
ITEM_ALPHA = 1e-6
# Define a new model instance
model = LightFM(loss='warp',
item_alpha=ITEM_ALPHA,
no_components=NUM_COMPONENTS)
# Fit the hybrid model. Note that this time, we pass
# in the item features matrix.
model = model.fit(train,
item_features=sparse_features_matrix,
epochs=NUM_EPOCHS,
num_threads=NUM_THREADS)
# Don't forget the pass in the item features again!
train_auc = auc_score(model,
train,
item_features=sparse_features_matrix,
num_threads=NUM_THREADS).mean()
print('Hybrid training set AUC: %s' % train_auc)
test_auc = auc_score(model,
test,
#train_interactions=train, # Unable to run with this line uncommented
item_features=sparse_features_matrix,
num_threads=NUM_THREADS).mean()
print('Hybrid test set AUC: %s' % test_auc)我有两个问题:
1)运行未注释的行(train_interactions=train)最初会产生不一致的形状
"test“数据集通过以下代码块进行了修改,以在其下面追加一个零块,直到其尺寸与我的火车数据集的尺寸匹配(根据此建议:https://github.com/lyst/lightfm/issues/369):
#Add X users to Test so that the number of rows in Train match Test
N = train.shape[0] #Rows in Train set
n,m = test.shape #Rows & columns in Test set
z = np.zeros([(N-n),m]) #Create the necessary rows of zeros with m columns
test = test.todense() #Temporarily convert Test into a numpy array
test = np.vstack((test,z)) #Vertically stack Test on top of the blank users
test = sparse.csr_matrix(test) #Convert back to sparse2)形状问题解决后,我尝试实现"train_interactions=train“。
但遇到测试交互矩阵和训练交互矩阵共享68个交互。这将导致不正确的评估,请检查数据拆分。。
我不知道如何解决第二个问题。有什么想法吗?
详细信息:
-"sparse_features_matrix“是{项目x类别}的稀疏矩阵,如果一个项目是”意大利语“和”比萨“,那么”意大利语“和”比萨“类在该项目的行中将有一个值"1”. "0“。
-"sparse_Rating_Matrix“是{用户x项}的稀疏矩阵,包含用户对餐厅(项目)的评级值。
04/08/2020最新情况:
LightFM有一个完整的数据库()类对象,您应该在模型评估之前使用它来准备数据集。我发现了一个很棒的github贴子(https://github.com/lyst/lightfm/issues/494),其中用户Med提供了一个很棒的小测试数据集。
当我通过这个方法准备我的数据时,我能够添加我想要建模的user_features (例如: User_1592喜欢“泰式”、“墨西哥式”、“寿司式”菜系)。
根据Turbo的评论,我使用了LightFM的random_train_test_split方法(最初通过sklearn的train_test_split方法分割数据),并使用新的火车/测试集运行auc_score,并正确地(据我所知)运行已准备好的模型--我仍然遇到相同的错误代码:
输入:
%%time
(train,test) = random_train_test_split(lightfm_interactions,test_percentage=0.25) #LightFM's method to split
# Don't forget the pass in the item features again!
train_auc = auc_score(model_users,
train,
user_features=lightfm_user_features_list,
num_threads=NUM_THREADS).mean()
print('User_feature training set AUC: %s' % train_auc)
test_auc = auc_score(model_users,
test,
#train_interactions=train, #Still can't get this to function
user_features=lightfm_user_features_list,
num_threads=NUM_THREADS).mean()
print('User_feature test set AUC: %s' % test_auc)如果使用"train_interactions=train“,则输出:
ValueError: Test interactions matrix and train interactions matrix share 435 interactions. This will cause incorrect evaluation, check your data split.好消息是所以我想如果可用的话坚持使用LightFM的方法是很重要的!
发布于 2020-04-07 15:35:54
LightFM提供了一种拆分数据集的方法,您看过吗?有了它,它可能会起作用。https://making.lyst.com/lightfm/docs/cross_validation.html
https://stackoverflow.com/questions/60984051
复制相似问题