首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将频率列中的每个元素转换为新的数据行?

如何将频率列中的每个元素转换为新的数据行?
EN

Stack Overflow用户
提问于 2022-07-11 12:32:22
回答 2查看 49关注 0票数 2

因此,我有一个数据格式,其中有一个ID列、一个文本列和一个令牌列,其中包含了文本中最常见的三个单词

代码语言:javascript
复制
ID           TEXT                                               TOKEN
sentence1    Emma Woodhouse , handsome , clever , and rich ...  [(emma, 2), (woodhouse, 2), (handsome, 1)]
sentence2    She was the youngest of the two daughters of a...  [(youngest, 1), (two, 1), (daughters, 2)]
sentence3    Her mother had died too long ago for her to ha...  [(mother, 2), (died, 1), (long, 1)]

我希望将令牌列中每一行的元素转换为新数据want中的新行。我尝试过许多方法,但是我无法从它们的列中获取标记元素。预期的产出如下:

代码语言:javascript
复制
WORD         FREQ    ID           TEXT                                
emma         2       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
woodhouse    2       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
handsome     1       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
youngest     1       sentence2    She was the youngest of the two daughters of a... 
two          1       sentence2    She was the youngest of the two daughters of a...
daughters    1       sentence2    She was the youngest of the two daughters of a... 

我开始认为做我想做的事情是不可能的。你能帮帮我吗?谢谢!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-07-11 12:37:43

让我们将explodeTOKEN列展开为新的数据格式,然后用原始数据返回join

代码语言:javascript
复制
s = df.explode('TOKEN', ignore_index=True)
pd.DataFrame([*s.pop('TOKEN')], columns=['WORD', 'FREQ']).join(s)

代码语言:javascript
复制
        WORD  FREQ         ID                                               TEXT
0       emma     2  sentence1  Emma Woodhouse , handsome , clever , and rich ...
1  woodhouse     2  sentence1  Emma Woodhouse , handsome , clever , and rich ...
2   handsome     1  sentence1  Emma Woodhouse , handsome , clever , and rich ...
3   youngest     1  sentence2  She was the youngest of the two daughters of a...
4        two     1  sentence2  She was the youngest of the two daughters of a...
5  daughters     2  sentence2  She was the youngest of the two daughters of a...
6     mother     2  sentence3  Her mother had died too long ago for her to ha...
7       died     1  sentence3  Her mother had died too long ago for her to ha...
8       long     1  sentence3  Her mother had died too long ago for her to ha...
票数 1
EN

Stack Overflow用户

发布于 2022-07-11 12:53:53

您可以爆掉令牌列,然后用所需的列名转换并创建一个dataframe,然后就可以将它与原始的dataframe按列方式连接起来:

代码语言:javascript
复制
pd.concat(
    [df.TOKEN.explode().transform(pd.Series)
     .rename(columns={0:'WORD', 1:'FREQ'}), 
     df.drop(columns="TOKEN")],
axis=1)

输出

代码语言:javascript
复制
        WORD  FREQ         ID                                     TEXT
0       emma     2  sentence1  Emma Woodhouse , handsome , clever ,...
0  woodhouse     2  sentence1  Emma Woodhouse , handsome , clever ,...
0   handsome     1  sentence1  Emma Woodhouse , handsome , clever ,...
1   youngest     1  sentence2  She was the youngest of the two daug...
1        two     1  sentence2  She was the youngest of the two daug...
1  daughters     2  sentence2  She was the youngest of the two daug...
2     mother     2  sentence3  Her mother had died too long ago for...
2       died     1  sentence3  Her mother had died too long ago for...
2       long     1  sentence3  Her mother had died too long ago for...

如果需要,可以重新设置索引。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72938658

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档