我有一份数据
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date':['01-01-2020','01-01-2020','01-01-2020','01-01-2020','01-01-2020'],
'Shift':['A','A','A','A','A'],
'heat_number':['HA1','HA10','HA8','HA18A','HA5']})看上去像这样
Date Shift heat_number
0 01-01-2020 A HA1
1 01-01-2020 A HA10
2 01-01-2020 A HA8
3 01-01-2020 A HA18A
4 01-01-2020 A HA5
5 01-01-2020 A HA18如果我做了df.sort_values(['Date','Shift',heat_number]),就会得到以下输出:
Date Shift heat_number
0 01-01-2020 A HA1
1 01-01-2020 A HA10
5 01-01-2020 A HA18
3 01-01-2020 A HA18A
4 01-01-2020 A HA5
2 01-01-2020 A HA8但我想要的输出是:
Date Shift heat_number
0 01-01-2020 A HA1
4 01-01-2020 A HA5
2 01-01-2020 A HA8
1 01-01-2020 A HA10
5 01-01-2020 A HA18
3 01-01-2020 A HA18A热数列中的过滤器不符合预期。我怎么才能解决这个问题?
发布于 2020-07-17 15:26:24
您可以将新的psuedo列分配给dataFrame DataFrame.assign (提取heat_number ),在psuedo列上应用sort_values。最后,drop psuedo列
(
df.assign(sort_by=df.heat_number.str.extract("(\d+)").astype(int))
.sort_values(by="sort_by")
.drop(columns="sort_by")
) Date Shift heat_number
0 01-01-2020 A HA1
4 01-01-2020 A HA5
2 01-01-2020 A HA8
1 01-01-2020 A HA10
3 01-01-2020 A HA18A发布于 2020-07-17 15:25:41
下面是我要做的事情:
df['len_heat'] = df.heat_number.str.len()
df = df.sort_values(['Date','Shift',"len_heat"])
del df['len_heat']基本上,它添加了一个具有字符串长度的列,对该列进行排序和删除。
https://stackoverflow.com/questions/62956922
复制相似问题