我正在尝试制作一个简单的程序,通过参考关键字列表来分配代码给课程。
现在,我能够处理一个关键字列表,其中每行关键字的长度固定为2:
#The list of keyword with length fixed to 2
keyword = pd.DataFrame({
'code':['001','002','003'],
'keyword': [
['edu|teach','primary sch|secondary sch|junior sch|preliminary sch'], # length = 2
['elderly|disabled|special','care'], # length = 2
['digital|social media','marketing']] # length = 2
})
# The list of educational programmed for which codes are to be assigned
course = pd.DataFrame({
'course':
['certificate in digital marketing',
'certificate in elderly care',
'diploma in primary school education',
'bachelor in traditional chinese medicine',
'master of law']
})
# To generate shortlist of coded courses
courseresult = pd.DataFrame()
for i in range(0,len(keyword['keyword'])):
courseshortlist = course[
(course.course.str.contains(keyword['keyword'][i][0]) & course.course.str.contains(keyword['keyword'][i][1]))
]
courseshortlist['autocode'] = keyword['code'][i]
courseresult = courseresult.append(courseshortlist)但是,我不确定如何处理长度可变的关键字列表的循环:
keyword_variable = pd.DataFrame({
'code':['001','002','003','004','005'],
'keyword': [
['law'], # length = 1
['edu|teach','primary sch|secondary sch|junior sch|preliminary sch'], # length = 2
['elderly|disabled|special','care'], # length = 2
['digital|social media','marketing'], # length = 2
['traditional','chinese','medicine'] # length = 3
]
})更新:我只是通过一些丑陋和笨拙的尝试和例外代码得到了我想要的:
courseresult = pd.DataFrame()
for i in range(0,len(keyword_variable['keyword'])):
try:
condition0 = course.course.str.contains(keyword_variable['keyword'][i][0])
condition1 = course.course.str.contains(keyword_variable['keyword'][i][1])
condition2 = course.course.str.contains(keyword_variable['keyword'][i][2])
condition = condition0 & condition1 & condition2
except IndexError:
try:
condition0 = course.course.str.contains(keyword_variable['keyword'][i][0])
condition1 = course.course.str.contains(keyword_variable['keyword'][i][1])
condition = condition0 & condition1
except IndexError:
condition = course.course.str.contains(keyword_variable['keyword'][i][0])
courseshortlist = course[(condition)]
courseshortlist['autocode'] = keyword_variable['code'][i]
courseresult = courseresult.append(courseshortlist)
courseresult
Out[1]:
course autocode
4 master of law 001
2 diploma in primary school education 002
1 certificate in elderly care 003
0 certificate in digital marketing 004
3 bachelor in traditional chinese medicine 005但我相信一定有更好的方法来做到这一点?非常感谢!
发布于 2019-09-03 12:31:37
假设您并不真正需要将结果放在单独的DataFrame中:
for i in range(0,len(keyword_variable['keyword'])):
condition = pd.Series([True]*len(course))
for k in keyword_variable['keyword'][i]:
condition = condition & course.course.str.contains(k)
course.loc[condition, 'autocode'] = keyword_variable['code'][i]
print(course)如果您确实需要一个新的副本,只需先创建一个副本,相同的解决方案。
https://stackoverflow.com/questions/57764690
复制相似问题