文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Python3，如何将包含一组范围的一列的df拆分为具有多个子范围的df？

问使用Python3，如何将包含一组范围的一列的df拆分为具有多个子范围的df？
EN

Stack Overflow用户

提问于 2020-12-12 14:50:51

回答 2查看 74关注 0票数 1

因此，我有以下几点：

#this is the data we have
df = pd.DataFrame(data=(['A','1-50', 10],['B','25-200', 15],['C','25-300', 5]), columns=['Category','Range', 'Qty'])

#these are the different range categories we need to have.
list_of_ranges = ['1-10', '10-25', '25-50', '50-100', '100-200', '200-300', '300-400']

# insert magic spells here

#this is what the result needs to look like
results = pd.DataFrame(data=(['A','1-25', 10],['A','25-50', 10],['B','25-50', 15],['B','50-100', 15],['B','100-200', 15],['C','25-50', 15],['C','50-100', 15],['C','100-200', 15],['C','200-300', 5]), columns=['Category','Range', 'Qty'])

如上面的例子所示：

我有一个df，它的范围需要细分为子范围，除了新的范围外，所有的列都需要重复。我怎么能这么做？

Edit1:逻辑的示例

"A“区的气温每年从摄氏1-50度持续10天。这是一个单独的行，内容如下：

1: A,1-50,10

这一行可以解释为:在"A“区，温度范围可以是每年1-10天、10-25天或25-50天。所以我想要三行：

1: A,1-10,10
2: A,10-25,10
3: A,25-50,10

python

pandas

dataframe

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-12-12 17:13:08

我们需要几个函数来处理您定义的“范围”，但否则问题是为df中的每个“范围”创建一个“小范围”列表，然后再为df中的“explode-ing”创建一个df。

def split_range(r):
    """
    split range into a tuple. range is a string 'xx-yy'
    """
    tokens =  r.split('-')
    return (int(tokens[0]), int(tokens[1]))

def is_inside(r1,r2):
    """
    True if range r1 is inside r2. Range is a string 'xx-yy'
    """
    t1, t2 = split_range(r1), split_range(r2)
    return (t1[0]>=t2[0]) and (t1[1] <= t2[1])

df['small_ranges'] = df.apply(lambda row: [rng for rng in list_of_ranges if is_inside(rng, row['Range']) ], axis=1)

这会产生

    Category    Range      Qty  small_ranges
--  ----------  -------  -----  -----------------------------------------
 0  A           1-50        10  ['1-10', '10-25', '25-50']
 1  B           25-200      15  ['25-50', '50-100', '100-200']
 2  C           25-300       5  ['25-50', '50-100', '100-200', '200-300']

现在我们explode

df.explode('small_ranges')

输出

    Category    Range      Qty  small_ranges
--  ----------  -------  -----  --------------
 0  A           1-50        10  1-10
 0  A           1-50        10  10-25
 0  A           1-50        10  25-50
 1  B           25-200      15  25-50
 1  B           25-200      15  50-100
 1  B           25-200      15  100-200
 2  C           25-300       5  25-50
 2  C           25-300       5  50-100
 2  C           25-300       5  100-200
 2  C           25-300       5  200-300

票数 1

Stack Overflow用户

发布于 2020-12-12 17:47:13

这里是一个使用pandas.Interval的解决方案，在这种情况下似乎非常有用。首先，我们将字符串转换为pd.Interval

list_of_ranges = [pd.Interval(*tuple(map(int, r.split('-')))) for r in list_of_ranges]

df['Range'] = df['Range'].apply(lambda r: pd.Interval(*tuple(map(int, r.split('-')))))

我们创建了一个新的DataFrame，包括每个原始范围所需的所有范围：

my_temps = []

for idx, row in df.iterrows():
    _df = pd.DataFrame(columns=df.columns)
    _df['Range'] = [r for r in list_of_ranges if r.overlaps(row['Range'])]
    _df['Category'], _df['Qty'] = row['Category'], row['Qty']

    my_temps.append(_df)

final_df = pd.concat(my_temps).reset_index(drop=True)

然后，我们最终将范围再次转换为它们的原始字符串格式：

final_df['Range'] = final_df['Range'].apply(lambda r: '{}-{}'.format(r.left, r.right))

这将产生以下数据：

  Category    Range  Qty
0        A     1-10   10
1        A    10-25   10
2        A    25-50   10
0        B    25-50   15
1        B   50-100   15
2        B  100-200   15
0        C    25-50    5
1        C   50-100    5
2        C  100-200    5
3        C  200-300    5

如果你还有任何问题，请告诉我们！

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65266290

复制

相似问题

问使用Python3，如何将包含一组范围的一列的df拆分为具有多个子范围的df？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python3，如何将包含一组范围的一列的df拆分为具有多个子范围的df？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python3，如何将包含一组范围的一列的df拆分为具有多个子范围的df？
EN