文章/答案/技术大牛

发布

社区首页 >问答首页 >如何将频率列中的每个元素转换为新的数据行？

问如何将频率列中的每个元素转换为新的数据行？
EN

Stack Overflow用户

提问于 2022-07-11 12:32:22

回答 2查看 49关注 0票数 2

因此，我有一个数据格式，其中有一个ID列、一个文本列和一个令牌列，其中包含了文本中最常见的三个单词

ID           TEXT                                               TOKEN
sentence1    Emma Woodhouse , handsome , clever , and rich ...  [(emma, 2), (woodhouse, 2), (handsome, 1)]
sentence2    She was the youngest of the two daughters of a...  [(youngest, 1), (two, 1), (daughters, 2)]
sentence3    Her mother had died too long ago for her to ha...  [(mother, 2), (died, 1), (long, 1)]

我希望将令牌列中每一行的元素转换为新数据want中的新行。我尝试过许多方法，但是我无法从它们的列中获取标记元素。预期的产出如下：

WORD         FREQ    ID           TEXT                                
emma         2       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
woodhouse    2       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
handsome     1       sentence1    Emma Woodhouse , handsome , clever , and rich ...  
youngest     1       sentence2    She was the youngest of the two daughters of a... 
two          1       sentence2    She was the youngest of the two daughters of a...
daughters    1       sentence2    She was the youngest of the two daughters of a...

我开始认为做我想做的事情是不可能的。你能帮帮我吗?谢谢!

python

pandas

dataframe

nlp

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-07-11 12:37:43

让我们将explode和TOKEN列展开为新的数据格式，然后用原始数据返回join

s = df.explode('TOKEN', ignore_index=True)
pd.DataFrame([*s.pop('TOKEN')], columns=['WORD', 'FREQ']).join(s)

        WORD  FREQ         ID                                               TEXT
0       emma     2  sentence1  Emma Woodhouse , handsome , clever , and rich ...
1  woodhouse     2  sentence1  Emma Woodhouse , handsome , clever , and rich ...
2   handsome     1  sentence1  Emma Woodhouse , handsome , clever , and rich ...
3   youngest     1  sentence2  She was the youngest of the two daughters of a...
4        two     1  sentence2  She was the youngest of the two daughters of a...
5  daughters     2  sentence2  She was the youngest of the two daughters of a...
6     mother     2  sentence3  Her mother had died too long ago for her to ha...
7       died     1  sentence3  Her mother had died too long ago for her to ha...
8       long     1  sentence3  Her mother had died too long ago for her to ha...

票数 1

Stack Overflow用户

发布于 2022-07-11 12:53:53

您可以爆掉令牌列，然后用所需的列名转换并创建一个dataframe，然后就可以将它与原始的dataframe按列方式连接起来：

pd.concat(
    [df.TOKEN.explode().transform(pd.Series)
     .rename(columns={0:'WORD', 1:'FREQ'}), 
     df.drop(columns="TOKEN")],
axis=1)

输出

        WORD  FREQ         ID                                     TEXT
0       emma     2  sentence1  Emma Woodhouse , handsome , clever ,...
0  woodhouse     2  sentence1  Emma Woodhouse , handsome , clever ,...
0   handsome     1  sentence1  Emma Woodhouse , handsome , clever ,...
1   youngest     1  sentence2  She was the youngest of the two daug...
1        two     1  sentence2  She was the youngest of the two daug...
1  daughters     2  sentence2  She was the youngest of the two daug...
2     mother     2  sentence3  Her mother had died too long ago for...
2       died     1  sentence3  Her mother had died too long ago for...
2       long     1  sentence3  Her mother had died too long ago for...

如果需要，可以重新设置索引。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72938658

复制

相似问题

问如何将频率列中的每个元素转换为新的数据行？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将频率列中的每个元素转换为新的数据行？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何将频率列中的每个元素转换为新的数据行？
EN