首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >基于不同的行和列拆分行

基于不同的行和列拆分行
EN

Stack Overflow用户
提问于 2020-12-03 16:43:20
回答 2查看 54关注 0票数 2

我真的很感谢你在这方面的帮助。

我有一个包含产品、日期和金额的表。这就是初始表的样子。

代码语言:javascript
复制
Product ID     goliveyear     endyear     Revenue   
    1            2020-10       2022-02       90
    1            2020-10       2022-02       140
    1            2020-10       2022-02       60

其目的是将每一行拆分成到年底的剩余月数,如果是第一年,则从第一年的月份开始拆分到年底,如果年份是结束年,则拆分到年末的月份。收入需要根据当月的行数进行划分,因为第一个表中的收入指的是整个期间。在这期间的所有年份将被分成12行,以及每个月的收入。

代码语言:javascript
复制
  Product ID     goliveyear     endyear     Year    Month   Revenue
    1            2020-10       2022-02       2020     10      90/3=30
    1            2020-10       2022-02       2020     11       30
    1            2020-10       2022-02       2020     12       30
    1            2020-10       2022-02       2021     01     140/12 =11.67
    1            2020-10       2022-02       2021     02       11.67
    1            2020-10       2022-02       2021     03       11.67
    1            2020-10       2022-02       2021     04       11.67
   ...            ...            ...         ...      ...      ... 
    1            2020-10       2022-02       2022     01      60/2 = 30
    1            2020-10       2022-02       2022     02       30

非常感谢大家。

EN

回答 2

Stack Overflow用户

发布于 2020-12-03 18:51:20

相当多的步骤。

从设置df开始

代码语言:javascript
复制
from io import StringIO
import pandas as pd
from datetime import datetime,timedelta


df = pd.read_csv(StringIO(
"""
Product_ID     goliveyear     endyear     Revenue   
    1            2020-10       2022-02       90
    1            2020-10       2022-02       140
    1            2020-10       2022-02       60
"""), delim_whitespace=True)
df['goliveyear'] = pd.to_datetime(df['goliveyear'])
df['endyear'] = pd.to_datetime(df['endyear'])
df

然后添加year_start、year_end、period_start、period_end列

代码语言:javascript
复制
df['ys'] = df['goliveyear'].dt.year + df.groupby('Product_ID').cumcount()
df['ye'] = df['ys'] + 1
df['ys'] = pd.to_datetime(df['ys'], format = '%Y')
df['ye'] = pd.to_datetime(df['ye'], format = '%Y')+ timedelta(days=-1)
df['ps'] = df[['goliveyear','ys']].max(axis=1)
df['pe'] = df[['endyear','ye']].min(axis=1) 

产生

代码语言:javascript
复制
      Product_ID  goliveyear           endyear                Revenue  ys                   ye                   ps                   pe
--  ------------  -------------------  -------------------  ---------  -------------------  -------------------  -------------------  -------------------
 0             1  2020-10-01 00:00:00  2022-02-01 00:00:00         90  2020-01-01 00:00:00  2020-12-31 00:00:00  2020-10-01 00:00:00  2020-12-31 00:00:00
 1             1  2020-10-01 00:00:00  2022-02-01 00:00:00        140  2021-01-01 00:00:00  2021-12-31 00:00:00  2021-01-01 00:00:00  2021-12-31 00:00:00
 2             1  2020-10-01 00:00:00  2022-02-01 00:00:00         60  2022-01-01 00:00:00  2022-12-31 00:00:00  2022-01-01 00:00:00  2022-02-01 00:00:00

然后,首先添加months作为列表

代码语言:javascript
复制
df['months'] = df.apply(lambda r: [d.month for d in pd.date_range(r['ps'], r['pe'], freq='MS', closed = None).to_pydatetime()], axis=1)

输出:

代码语言:javascript
复制
      Product_ID  goliveyear           endyear                Revenue  ys                   ye                   ps                   pe                   months
--  ------------  -------------------  -------------------  ---------  -------------------  -------------------  -------------------  -------------------  ---------------------------------------
 0             1  2020-10-01 00:00:00  2022-02-01 00:00:00         90  2020-01-01 00:00:00  2020-12-31 00:00:00  2020-10-01 00:00:00  2020-12-31 00:00:00  [10, 11, 12]
 1             1  2020-10-01 00:00:00  2022-02-01 00:00:00        140  2021-01-01 00:00:00  2021-12-31 00:00:00  2021-01-01 00:00:00  2021-12-31 00:00:00  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
 2             1  2020-10-01 00:00:00  2022-02-01 00:00:00         60  2022-01-01 00:00:00  2022-12-31 00:00:00  2022-01-01 00:00:00  2022-02-01 00:00:00  [1, 2]

然后我们分解months,对收入执行所需的计算,并删除不需要的列

代码语言:javascript
复制
df = df.explode('months')
df['Revenue'] = df['Revenue'] / df.groupby(['Product_ID','ys'])['months'].transform('count')
df = df.drop(columns = ['goliveyear','endyear','ye','ps','pe'])
df['ys'] = df['ys'].dt.year

要获得

代码语言:javascript
复制
      Product_ID    Revenue    ys    months
--  ------------  ---------  ----  --------
 0             1    30       2020        10
 0             1    30       2020        11
 0             1    30       2020        12
 1             1    11.6667  2021         1
 1             1    11.6667  2021         2
 1             1    11.6667  2021         3
 1             1    11.6667  2021         4
 1             1    11.6667  2021         5
 1             1    11.6667  2021         6
 1             1    11.6667  2021         7
 1             1    11.6667  2021         8
 1             1    11.6667  2021         9
 1             1    11.6667  2021        10
 1             1    11.6667  2021        11
 1             1    11.6667  2021        12
 2             1    30       2022         1
 2             1    30       2022         2
票数 2
EN

Stack Overflow用户

发布于 2020-12-03 18:27:43

试试这个:

代码语言:javascript
复制
import pandas as pd
from io import StringIO

s = """
Product ID,goliveyear,endyear,Revenue
1,2020-10,2022-02,90
1,2020-10,2022-02,140
1,2020-10,2022-02,60"""

df = pd.read_csv(StringIO(s))

# generate the months list between thest two months
df['rng'] = df.apply(lambda x: pd.date_range(x['goliveyear'], x['endyear'], 
freq='MS'), axis=1)

# explode the dataframe by months list
df_exploded = df.explode('rng')

df_exploded['Year'] = df_exploded['rng'].dt.year
df_exploded['Month'] = df_exploded['rng'].dt.month

# the (index,year) pair to fliter rows
filter_year = list(zip(df.index, df_exploded.Year.unique()))

# used columns
use_cols = ['Product ID', 'goliveyear', 'endyear', 'Revenue', 'Month']

# filter rows
df_filter = df_exploded.set_index([df_exploded.index, 
df_exploded.Year]).loc[filter_year, 
use_cols].reset_index().drop(columns='level_0')

# calculate the average Revenue
result = df_filter.set_index(['Year', "Month"]).assign(
    Revenue=(df_filter.groupby(['Year', 'Month'])['Revenue'].sum() / 
     df_filter.groupby('Year')['Month'].count())
).reset_index()

result

输出

代码语言:javascript
复制
    Year    Month   Product ID  goliveyear  endyear Revenue
0   2020    10  1   2020-10 2022-02 30.000000
1   2020    11  1   2020-10 2022-02 30.000000
2   2020    12  1   2020-10 2022-02 30.000000
3   2021    1   1   2020-10 2022-02 11.666667
4   2021    2   1   2020-10 2022-02 11.666667
5   2021    3   1   2020-10 2022-02 11.666667
6   2021    4   1   2020-10 2022-02 11.666667
7   2021    5   1   2020-10 2022-02 11.666667
8   2021    6   1   2020-10 2022-02 11.666667
9   2021    7   1   2020-10 2022-02 11.666667
10  2021    8   1   2020-10 2022-02 11.666667
11  2021    9   1   2020-10 2022-02 11.666667
12  2021    10  1   2020-10 2022-02 11.666667
13  2021    11  1   2020-10 2022-02 11.666667
14  2021    12  1   2020-10 2022-02 11.666667
15  2022    1   1   2020-10 2022-02 30.000000
16  2022    2   1   2020-10 2022-02 30.000000
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65122462

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档