因此,我有一个数据格式,其中有一个ID列、一个文本列和一个令牌列,其中包含了文本中最常见的三个单词
ID TEXT TOKEN
sentence1 Emma Woodhouse , handsome , clever , and rich ... [(emma, 2), (woodhouse, 2), (handsome, 1)]
sentence2 She was the youngest of the two daughters of a... [(youngest, 1), (two, 1), (daughters, 2)]
sentence3 Her mother had died too long ago for her to ha... [(mother, 2), (died, 1), (long, 1)]我希望将令牌列中每一行的元素转换为新数据want中的新行。我尝试过许多方法,但是我无法从它们的列中获取标记元素。预期的产出如下:
WORD FREQ ID TEXT
emma 2 sentence1 Emma Woodhouse , handsome , clever , and rich ...
woodhouse 2 sentence1 Emma Woodhouse , handsome , clever , and rich ...
handsome 1 sentence1 Emma Woodhouse , handsome , clever , and rich ...
youngest 1 sentence2 She was the youngest of the two daughters of a...
two 1 sentence2 She was the youngest of the two daughters of a...
daughters 1 sentence2 She was the youngest of the two daughters of a... 我开始认为做我想做的事情是不可能的。你能帮帮我吗?谢谢!
发布于 2022-07-11 12:37:43
让我们将explode和TOKEN列展开为新的数据格式,然后用原始数据返回join
s = df.explode('TOKEN', ignore_index=True)
pd.DataFrame([*s.pop('TOKEN')], columns=['WORD', 'FREQ']).join(s) WORD FREQ ID TEXT
0 emma 2 sentence1 Emma Woodhouse , handsome , clever , and rich ...
1 woodhouse 2 sentence1 Emma Woodhouse , handsome , clever , and rich ...
2 handsome 1 sentence1 Emma Woodhouse , handsome , clever , and rich ...
3 youngest 1 sentence2 She was the youngest of the two daughters of a...
4 two 1 sentence2 She was the youngest of the two daughters of a...
5 daughters 2 sentence2 She was the youngest of the two daughters of a...
6 mother 2 sentence3 Her mother had died too long ago for her to ha...
7 died 1 sentence3 Her mother had died too long ago for her to ha...
8 long 1 sentence3 Her mother had died too long ago for her to ha...发布于 2022-07-11 12:53:53
您可以爆掉令牌列,然后用所需的列名转换并创建一个dataframe,然后就可以将它与原始的dataframe按列方式连接起来:
pd.concat(
[df.TOKEN.explode().transform(pd.Series)
.rename(columns={0:'WORD', 1:'FREQ'}),
df.drop(columns="TOKEN")],
axis=1)输出
WORD FREQ ID TEXT
0 emma 2 sentence1 Emma Woodhouse , handsome , clever ,...
0 woodhouse 2 sentence1 Emma Woodhouse , handsome , clever ,...
0 handsome 1 sentence1 Emma Woodhouse , handsome , clever ,...
1 youngest 1 sentence2 She was the youngest of the two daug...
1 two 1 sentence2 She was the youngest of the two daug...
1 daughters 2 sentence2 She was the youngest of the two daug...
2 mother 2 sentence3 Her mother had died too long ago for...
2 died 1 sentence3 Her mother had died too long ago for...
2 long 1 sentence3 Her mother had died too long ago for...如果需要,可以重新设置索引。
https://stackoverflow.com/questions/72938658
复制相似问题