文章/答案/技术大牛

发布

社区首页 >问答首页 >文本上的熊猫群:获得每组多个句子的句子编号

问文本上的熊猫群:获得每组多个句子的句子编号
EN

Stack Overflow用户

提问于 2022-05-19 19:05:15

回答 1查看 34关注 0票数 1

我的数据看起来是这样的：

    id      sentence                                            ind
    747     A simple and convenient colorimetric method is...   NaN
    747     A simple and convenient colorimetric method is...   NaN
    747     A simple and convenient colorimetric method is...   ulcerative 
    749     Of special significance was the increased acti...   NaN
    749     Of special significance was the increased acti...   NaN
    749     Of special significance was the increased acti...   head injuries
    749     Of special significance was the increased acti...   NaN
    858     Some patients with acute viral hepatitis or pr...   acute viral 
    858     Some patients with acute viral hepatitis or pr...   NaN
    858     Some patients with acute viral hepatitis or pr...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     The other ALP isozyme of FL cells had properti...   NaN
    948     It was found that a human hepatoma-associated ...   NaN
    948     It was found that a human hepatoma-associated ...   hepatoma
    948     It was found that a human hepatoma-associated ...   NaN
    948     It was more heat stable and more sensitive to ...   virus
    948     It was more heat stable and more sensitive to ...   NaN
    948     It was more heat stable and more sensitive to ...   NaN

我正在使用df.groupby(['id', 'sentence']).first().head(20)，我得到了这个：

pmid    sentence                                            ind
747     A simple and convenient colorimetric method is...   NaN
749     Of special significance was the increased acti...   NaN
858     Some patients with acute viral hepatitis or pr...   acute viral 
948      It was found that a human hepatoma-associated...   hepatoma
         It was more heat stable and more sensitive to...   virus

正如我们所看到的，对于id=948，有多个(id-语句)对。

我的问题是:是否有一种方法可以为我的数据中的每个id获得一个句号，因为我对一个id有多个(id-语句)对？

例如，拥有以下内容：

id   sentence_nr   sentence                                           ind
747  01            A simple and convenient colorimetric method is...  NaN
749  01            Of special significance was the increased acti...  NaN
858  01            Some patients with acute viral hepatitis or pr...  acute viral 
948  01            It was found that a human hepatoma-associated ...  hepatoma 
948  02            It was more heat stable and more sensitive to ...  virus

python

pandas

group-by

pandas-groupby

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-05-19 19:29:33

你可以用GroupBy.cumcount

df_grouped = df.groupby(['id', 'sentence'], as_index=False).first()
df_grouped['sentence_nr'] = df_grouped.groupby(df_grouped['id']).cumcount() + 1

print(df_grouped)

    id                                           sentence            ind  sentence_nr
0  747  A simple and convenient colorimetric method is...     ulcerative            1
1  749  Of special significance was the increased acti...  head injuries            1
2  858  Some patients with acute viral hepatitis or pr...    acute viral            1
3  948  It was found that a human hepatoma-associated ...       hepatoma            1
4  948  It was more heat stable and more sensitive to ...          virus            2
5  948  The other ALP isozyme of FL cells had properti...           None            3

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72309886

复制

相似问题

问文本上的熊猫群:获得每组多个句子的句子编号
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问文本上的熊猫群:获得每组多个句子的句子编号EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问文本上的熊猫群:获得每组多个句子的句子编号
EN