文章/答案/技术大牛

发布

社区首页 >问答首页 >如何检查数据帧列中的数据帧列值是否在所有唯一年份中出现

问如何检查数据帧列中的数据帧列值是否在所有唯一年份中出现
EN

Stack Overflow用户

提问于 2021-05-09 13:03:40

回答 3查看 58关注 0票数 0

df1 = pd.DataFrame({'type': ['cst1', 'cst1', 'cst1','cst1','cst2','cst2','cst2','cst3','cst3','cst3','cst3'],'year':[2017,2018,2019,2020,2018,2019,2020,2017,2018,2019,2020]})

   type  year
0   cst1  2017
1   cst1  2018
2   cst1  2019
3   cst1  2020
4   cst2  2018
5   cst2  2019
6   cst2  2020
7   cst3  2017
8   cst3  2018
9   cst3  2019
10  cst3  2020

对于上述数据，需要检查每种类型的值，如果它存在于所有四年的2017,2018,2019,2020需要标签为1，其他wise 0。例:第一种类型cst1出现在所有4年中，标记为1，cst2只出现在3年，标记为1。注:理想情况下只包含4年i，e 2017 - 2020。类型和年份组合将是唯一的。

期望产出：

type  year label
0   cst1  2017     1
1   cst1  2018     1
2   cst1  2019     1
3   cst1  2020     1
4   cst2  2018     0
5   cst2  2019     0
6   cst2  2020     0
7   cst3  2017     1
8   cst3  2018     1
9   cst3  2019     1
10  cst3  2020     1

python

pandas

回答 3

Stack Overflow用户

回答已采纳

发布于 2021-05-09 13:16:10

我想，如果所有年份都在2017至2020年间，则用nunique进行群比/转换就可以了：

df['label'] = (df1.groupby('type').transform('nunique') == 4).astype(int)

备选方案：

df1['label'] = 0 
def test(x):
    return set(x.values) == {2017,2018,2019,2020}
df1.iloc[df1.groupby('type')['year'].filter(test).index , 2] = 1

票数 2

Stack Overflow用户

发布于 2021-05-09 13:26:42

在类型的基础上使用groupby()创建组
使用transform()根据组获取每行年数的元组
将这些元组与所需的年份进行比较。它将为每一行生成真/假。
使用布尔(True/False)将布尔(1/0)转换为整数

required = (2017,2018,2019,2020)
df1["label"] = (df1.groupby('type').transform(tuple)["year"] == required).astype('int')

print(df1)

    type    year    label
0   cst1    2017    1
1   cst1    2018    1
2   cst1    2019    1
3   cst1    2020    1
4   cst2    2018    0
5   cst2    2019    0
6   cst2    2020    0
7   cst3    2017    1
8   cst3    2018    1
9   cst3    2019    1
10  cst3    2020    1

票数 2

Stack Overflow用户

发布于 2021-05-09 13:11:16

让我们试试：

测试每个组的年数是否是所需年份的子集的群比变换。
使用astype(int)将布尔转换为1和0

import pandas as pd

df1 = pd.DataFrame({'type': ['cst1', 'cst1', 'cst1', 'cst1', 'cst2', 'cst2',
                             'cst2', 'cst3', 'cst3', 'cst3', 'cst3'],
                    'year': [2017, 2018, 2019, 2020, 2018, 2019, 2020, 2017,
                             2018, 2019, 2020]})

years = {2017, 2018, 2019, 2020}

df1['label'] = (
    df1.groupby('type').year.transform(lambda x: years.issubset(x))
).astype(int)
print(df1)

df1

    type  year  label
0   cst1  2017      1
1   cst1  2018      1
2   cst1  2019      1
3   cst1  2020      1
4   cst2  2018      0
5   cst2  2019      0
6   cst2  2020      0
7   cst3  2017      1
8   cst3  2018      1
9   cst3  2019      1
10  cst3  2020      1

*注意，这将与至少有四年时间的任何一组人相匹配。因此，如果一个团体包括2016,2017,2018,2019,2020的参赛作品，它将被匹配。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67458024

复制

相似问题

问如何检查数据帧列中的数据帧列值是否在所有唯一年份中出现
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何检查数据帧列中的数据帧列值是否在所有唯一年份中出现EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何检查数据帧列中的数据帧列值是否在所有唯一年份中出现
EN