我想为每个产品分配一个唯一的ID,它是子类别。输入数据如下所示:
d = {'Manufacturer': ['Samsung','Samsung','Siemens','Siemens','Siemens','Apple','Apple'],
'Product': ['Phone','Phone','Computer','Sensor','Sensor','Phone','MacBook']}
df = pd.DataFrame(data=d)
Manufacturer Product
0 Samsung Phone
1 Samsung Phone
2 Siemens Computer
3 Siemens Sensor
4 Siemens Sensor
5 Apple Phone
6 Apple MacBook我希望UNIQUE_ID能够告知Manufacturer和Product,所以我想出了这样的解决方案:
df['ID_Manufacturer'] = df.groupby(['Manufacturer']).ngroup()
df['ID_Product'] = df.groupby(['Product']).ngroup()
columns = ['ID_Manufacturer', 'ID_Product']
df[columns] = df[columns].astype(str)
df['UNIQUE_ID'] = df[columns].apply(lambda x: '.'.join(x[x.notnull()]), axis = 1)
df.drop(['ID_Manufacturer', 'ID_Product'], axis = 1)其结果是:
Manufacturer Product UNIQUE_ID
0 Samsung Phone 1.2
1 Samsung Phone 1.2
2 Siemens Computer 2.0
3 Siemens Sensor 2.3
4 Siemens Sensor 2.3
5 Apple Phone 0.2
6 Apple MacBook 0.1不过,我想取得更多的成果是:
从1开始的
因此,最终的输出应该如下所示:
Manufacturer Product UNIQUE_ID
0 Samsung Phone 3.1
1 Samsung Phone 3.1
2 Siemens Computer 1.2
3 Siemens Sensor 1.3
4 Siemens Sensor 1.3
5 Apple Phone 2.4
6 Apple MacBook 2.3发布于 2022-08-25 10:58:23
您可以对pandas.factorize的输出使用value_counts (默认情况下按频率的降序排序):
id1, val1 = pd.factorize(df['Manufacturer'].value_counts().index)
id2, val2 = pd.factorize(df['Product'].value_counts().index)
df['UNIQUE_ID'] = (
df['Manufacturer'].map(pd.Series(id1+1, index=val1).astype(str))
+'.'+
df['Product'].map(pd.Series(id2+1, index=val2).astype(str))
)产出:
Manufacturer Product UNIQUE_ID
0 Samsung Phone 2.1
1 Samsung Phone 2.1
2 Siemens Computer 1.3
3 Siemens Sensor 1.2
4 Siemens Sensor 1.2
5 Apple Phone 3.1
6 Apple MacBook 3.4https://stackoverflow.com/questions/73486234
复制相似问题