首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >文本上的熊猫群:获得每组多个句子的句子编号

文本上的熊猫群:获得每组多个句子的句子编号
EN

Stack Overflow用户
提问于 2022-05-19 19:05:15
回答 1查看 34关注 0票数 1

我的数据看起来是这样的:

代码语言:javascript
复制
    id      sentence                                            ind
    747     A simple and convenient colorimetric method is...   NaN
    747     A simple and convenient colorimetric method is...   NaN
    747     A simple and convenient colorimetric method is...   ulcerative 
    749     Of special significance was the increased acti...   NaN
    749     Of special significance was the increased acti...   NaN
    749     Of special significance was the increased acti...   head injuries
    749     Of special significance was the increased acti...   NaN
    858     Some patients with acute viral hepatitis or pr...   acute viral 
    858     Some patients with acute viral hepatitis or pr...   NaN
    858     Some patients with acute viral hepatitis or pr...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     It was found that a human hepatoma-associated ...   NaN
    948     It was found that a human hepatoma-associated ...   hepatoma
    948     It was found that a human hepatoma-associated ...   NaN
    948     It was more heat stable and more sensitive to ...   virus
    948     It was more heat stable and more sensitive to ...   NaN
    948     It was more heat stable and more sensitive to ...   NaN

我正在使用df.groupby(['id', 'sentence']).first().head(20),我得到了这个:

代码语言:javascript
复制
pmid    sentence                                            ind
747     A simple and convenient colorimetric method is...   NaN
749     Of special significance was the increased acti...   NaN
858     Some patients with acute viral hepatitis or pr...   acute viral 
948      It was found that a human hepatoma-associated...   hepatoma
         It was more heat stable and more sensitive to...   virus

正如我们所看到的,对于id=948,有多个(id-语句)对。

我的问题是:是否有一种方法可以为我的数据中的每个id获得一个句号,因为我对一个id有多个(id-语句)对?

例如,拥有以下内容:

代码语言:javascript
复制
id   sentence_nr   sentence                                           ind
747  01            A simple and convenient colorimetric method is...  NaN
749  01            Of special significance was the increased acti...  NaN
858  01            Some patients with acute viral hepatitis or pr...  acute viral 
948  01            It was found that a human hepatoma-associated ...  hepatoma 
948  02            It was more heat stable and more sensitive to ...  virus
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-05-19 19:29:33

你可以用GroupBy.cumcount

代码语言:javascript
复制
df_grouped = df.groupby(['id', 'sentence'], as_index=False).first()
df_grouped['sentence_nr'] = df_grouped.groupby(df_grouped['id']).cumcount() + 1

print(df_grouped)
代码语言:javascript
复制
    id                                           sentence            ind  sentence_nr
0  747  A simple and convenient colorimetric method is...     ulcerative            1
1  749  Of special significance was the increased acti...  head injuries            1
2  858  Some patients with acute viral hepatitis or pr...    acute viral            1
3  948  It was found that a human hepatoma-associated ...       hepatoma            1
4  948  It was more heat stable and more sensitive to ...          virus            2
5  948  The other ALP isozyme of FL cells had properti...           None            3
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72309886

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档