我有一个词栏:
> print(df['words'])
0 [awww, thats, bummer, shoulda, got, david, car...
1 [upset, that, he, cant, update, his, facebook,...
2 [dived, many, time, ball, managed, save, rest,...
3 [whole, body, feel, itchy, like, it, on, fire]
4 [no, it, not, behaving, at, all, im, mad, why,...
5 [not, whole, crew]另一栏为每个词的“情感”价值:
> print(sentiment)
abandon -2
0 abandoned -2
1 abandons -2
2 abducted -2
3 abduction -2
4 abductions -2
5 abhor -3
6 abhorred -3
7 abhorrent -3
8 abhors -3
9 abilities 2
...对于df['words']中的每一行词,我想总结它们各自的情感值。对于情感中不存在的词,等于0。
到目前为止,这就是我所拥有的:
df['sentiment_value'] = Sum(df['words'].apply(lambda x: ''.join(x+x for x in sentiment))预期结果
print(df['sentiment_value'])
0 -5
1 2
2 15
3 -6
4 -8
...发布于 2019-02-14 17:47:31
如果你把分数作为一个系列,用单词作为标签:
In [11]: s # e.g. sentiment.set_index("word")["score"]
Out[11]:
abandon -2
abandoned -2
abandons -2
abducted -2
abduction -2
Name: score, dtype: int64然后你可以查找一个列表的得分:
In [12]: s.loc[["abandon", "abducted"]].sum()
Out[12]: -4因此,申请将是:
df['words'].apply(lambda ls: s.loc[ls])如果您需要支持缺少的单词(而不是s),可以使用reindex:
In [21]: s.reindex(["abandon", "abducted", "missing_word"]).sum()
Out[21]: -4.0
df['words'].apply(lambda ls: s.reindex(ls))发布于 2019-02-14 17:44:56
如果第二列在字符串中有值,那么首先需要通过将列转换为两列来筛选数据。
df['Sentiment'],df['Sentiment_value']=df.sentiment.str.split(" ")然后,您可以从情感栏中找到情感索引,从sentiment_value列中获得价值。
https://stackoverflow.com/questions/54695916
复制相似问题