首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在熊猫数据栏中记录以前的日期记录?

如何在熊猫数据栏中记录以前的日期记录?
EN

Stack Overflow用户
提问于 2017-09-13 11:16:04
回答 1查看 251关注 0票数 1

此问题来自this SO Question.

我想对熊猫数据进行一些数据分析。我有一份数据,如下所示:

代码语言:javascript
复制
                   derived_symbol  sport_name person_name      city  \
0       football.RAM.mumbai.ram_count    football         RAM    mumbai   
1       football.RAM.mumbai.mum_count    football         RAM    mumbai   
2        football.RAM.delhi.mum_count    football         RAM     delhi   
3        football.RAM.delhi.ram_count    football         RAM     delhi   
4       football.RAM.mumbai.ram_count    football         RAM    mumbai   
5       football.RAM.mumbai.mum_count    football         RAM    mumbai   
6        football.RAM.delhi.mum_count    football         RAM     delhi   
7        football.RAM.delhi.ram_count    football         RAM     delhi   
8       basketball.MAH.pune.mah_count  basketball         MAH      pune   
9     basketball.MAH.nagpur.mah_count  basketball         MAH    nagpur   
10     basketball.MAH.TOTAL.mah_count  basketball         MAH  No Entry   
11  basketball.MAH.TOTAL.nagpur_count  basketball         MAH    nagpur   
12    basketball.MAH.TOTAL.pune_count  basketball         MAH      pune   
13     football.RAM.TOTAL.delhi_count    football         RAM     delhi   
14     football.RAM.TOTAL.delhi_count    football         RAM     delhi   
15       football.RAM.TOTAL.mum_count    football         RAM  No Entry   
16       football.RAM.TOTAL.mum_count    football         RAM  No Entry   
17    football.RAM.TOTAL.mumbai_count    football         RAM    mumbai   
18    football.RAM.TOTAL.mumbai_count    football         RAM    mumbai   
19       football.RAM.TOTAL.ram_count    football         RAM  No Entry   
20       football.RAM.TOTAL.ram_count    football         RAM  No Entry   

   person_symbol       month sir  person_count  
0            ram  2017-01-23   a            10  
1            mum  2017-01-23   a            14  
2            mum  2017-01-23   a            25  
3            ram  2017-01-23   a            20  
4            ram  2017-02-22   b            34  
5            mum  2017-02-22   b            23  
6            mum  2017-02-22   b            43  
7            ram  2017-02-22   b            34  
8            mah  2017-03-03   c            10  
9            mah  2017-03-03   c            20  
10           mah  2017-03-03   c            30  
11      No Entry  2017-03-03   c            20  
12      No Entry  2017-03-03   c            10  
13      No Entry  2017-01-23   a            45  
14      No Entry  2017-02-22   b            77  
15           mum  2017-01-23   a            39  
16           mum  2017-02-22   b            66  
17      No Entry  2017-01-23   a            24  
18      No Entry  2017-02-22   b            57  
19           ram  2017-01-23   a            30  
20           ram  2017-02-22   b            68

我想将previous_person_count列添加到这个Dataframe中。此dataframe的“月份”列包含“yyyy dd”格式的日期。因此,我们需要查看月份,即"mm“字段,以确定它是哪个月。

通过查看这个月,我们需要将"person_count“值放入下一个月的"previous_person_count”值。

输出:

代码语言:javascript
复制
                derived_symbol  sport_name person_name      city  \
0       football.RAM.mumbai.ram_count    football         RAM    mumbai   
1       football.RAM.mumbai.mum_count    football         RAM    mumbai   
2        football.RAM.delhi.mum_count    football         RAM     delhi   
3        football.RAM.delhi.ram_count    football         RAM     delhi   
4       football.RAM.mumbai.ram_count    football         RAM    mumbai   
5       football.RAM.mumbai.mum_count    football         RAM    mumbai   
6        football.RAM.delhi.mum_count    football         RAM     delhi   
7        football.RAM.delhi.ram_count    football         RAM     delhi   
8       basketball.MAH.pune.mah_count  basketball         MAH      pune   
9     basketball.MAH.nagpur.mah_count  basketball         MAH    nagpur   
10     basketball.MAH.TOTAL.mah_count  basketball         MAH  No Entry   
11  basketball.MAH.TOTAL.nagpur_count  basketball         MAH    nagpur   
12    basketball.MAH.TOTAL.pune_count  basketball         MAH      pune   
13     football.RAM.TOTAL.delhi_count    football         RAM     delhi   
14     football.RAM.TOTAL.delhi_count    football         RAM     delhi   
15       football.RAM.TOTAL.mum_count    football         RAM  No Entry   
16       football.RAM.TOTAL.mum_count    football         RAM  No Entry   
17    football.RAM.TOTAL.mumbai_count    football         RAM    mumbai   
18    football.RAM.TOTAL.mumbai_count    football         RAM    mumbai   
19       football.RAM.TOTAL.ram_count    football         RAM  No Entry   
20       football.RAM.TOTAL.ram_count    football         RAM  No Entry   

   person_symbol       month sir  person_count      previous_person_count
0            ram  2017-01-23   a            10      0
1            mum  2017-01-23   a            14      0
2            mum  2017-01-23   a            25      0
3            ram  2017-01-23   a            20      0
4            ram  2017-02-22   b            34      10
5            mum  2017-02-22   b            23      14
6            mum  2017-02-22   b            43      25
7            ram  2017-02-22   b            34      20
8            mah  2017-03-03   c            10      0
9            mah  2017-03-03   c            20      0
10           mah  2017-03-03   c            30      0
11      No Entry  2017-03-03   c            20      0
12      No Entry  2017-03-03   c            10      0
13      No Entry  2017-01-23   a            45      0
14      No Entry  2017-02-22   b            77      45
15           mum  2017-01-23   a            39      0
16           mum  2017-02-22   b            66      39
17      No Entry  2017-01-23   a            24      0
18      No Entry  2017-02-22   b            57      24
19           ram  2017-01-23   a            30      0
20           ram  2017-02-22   b            68      30

编辑参考代码:

代码语言:javascript
复制
df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football','basketball','basketball'],
           'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','mahesh','mahesh'],
               'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi','pune','nagpur'],
        'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram','mah','mah'],
        'person_count': ['10','14','25','20','34','23','43','34','10','20'],
        'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-22','2017-02-22','2017-02-22','2017-02-22','2017-03-03','2017-03-03'],
        'sir': ['a','a','a','a','b','b','b','b','c','c']})

df = df[['sport_name','person_name','city','person_symbol','person_count','month','sir']]

df['person_name'] = df['person_name'].apply(symbology)

df['person_count'] = df['person_count'].astype(int)

print df
df1=df.set_index(['sport_name','person_name','person_count','month','sir']).stack().reset_index(name='val')

df1['derived_symbol'] = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count'

df2 = df1.groupby(['derived_symbol','month','sir','sport_name','person_name','level_5','val'])['person_count'].sum().reset_index(name='person_count')

df3 = df2.set_index(['derived_symbol','month','sir','sport_name','person_name','person_count','level_5'])['val'].unstack().fillna('No Entry').rename_axis(None, 1).reset_index()

df['derived_symbol'] = df['sport_name'] + '.' + df['person_name'] + '.' + df['city'] + "."+ df['person_symbol'] + '_count'
df4 = pd.concat([df, df3]).reset_index(None)
print df3
del df4['index']
df4 = df4[['derived_symbol','sport_name','person_name','city','person_symbol','month','sir','person_count']]
print df4

方便:

代码语言:javascript
复制
d = {'city': {0: 'mumbai',
  1: 'mumbai',
  2: 'delhi',
  3: 'delhi',
  4: 'mumbai',
  5: 'mumbai',
  6: 'delhi',
  7: 'delhi',
  8: 'pune',
  9: 'nagpur',
  10: 'No Entry',
  11: 'nagpur',
  12: 'pune',
  13: 'delhi',
  14: 'delhi',
  15: 'No Entry',
  16: 'No Entry',
  17: 'mumbai',
  18: 'mumbai',
  19: 'No Entry',
  20: 'No Entry'},
 'derived_symbol': {0: 'football.RAM.mumbai.ram_count',
  1: 'football.RAM.mumbai.mum_count',
  2: 'football.RAM.delhi.mum_count',
  3: 'football.RAM.delhi.ram_count',
  4: 'football.RAM.mumbai.ram_count',
  5: 'football.RAM.mumbai.mum_count',
  6: 'football.RAM.delhi.mum_count',
  7: 'football.RAM.delhi.ram_count',
  8: 'basketball.MAH.pune.mah_count',
  9: 'basketball.MAH.nagpur.mah_count',
  10: 'basketball.MAH.TOTAL.mah_count',
  11: 'basketball.MAH.TOTAL.nagpur_count',
  12: 'basketball.MAH.TOTAL.pune_count',
  13: 'football.RAM.TOTAL.delhi_count',
  14: 'football.RAM.TOTAL.delhi_count',
  15: 'football.RAM.TOTAL.mum_count',
  16: 'football.RAM.TOTAL.mum_count',
  17: 'football.RAM.TOTAL.mumbai_count',
  18: 'football.RAM.TOTAL.mumbai_count',
  19: 'football.RAM.TOTAL.ram_count',
  20: 'football.RAM.TOTAL.ram_count'},
 'month': {0: '2017-01-23',
  1: '2017-01-23',
  2: '2017-01-23',
  3: '2017-01-23',
  4: '2017-02-22',
  5: '2017-02-22',
  6: '2017-02-22',
  7: '2017-02-22',
  8: '2017-03-03',
  9: '2017-03-03',
  10: '2017-03-03',
  11: '2017-03-03',
  12: '2017-03-03',
  13: '2017-01-23',
  14: '2017-02-22',
  15: '2017-01-23',
  16: '2017-02-22',
  17: '2017-01-23',
  18: '2017-02-22',
  19: '2017-01-23',
  20: '2017-02-22'},
 'person_count': {0: 10,
  1: 14,
  2: 25,
  3: 20,
  4: 34,
  5: 23,
  6: 43,
  7: 34,
  8: 10,
  9: 20,
  10: 30,
  11: 20,
  12: 10,
  13: 45,
  14: 77,
  15: 39,
  16: 66,
  17: 24,
  18: 57,
  19: 30,
  20: 68},
 'person_name': {0: 'RAM',
  1: 'RAM',
  2: 'RAM',
  3: 'RAM',
  4: 'RAM',
  5: 'RAM',
  6: 'RAM',
  7: 'RAM',
  8: 'MAH',
  9: 'MAH',
  10: 'MAH',
  11: 'MAH',
  12: 'MAH',
  13: 'RAM',
  14: 'RAM',
  15: 'RAM',
  16: 'RAM',
  17: 'RAM',
  18: 'RAM',
  19: 'RAM',
  20: 'RAM'},
 'person_symbol': {0: 'ram',
  1: 'mum',
  2: 'mum',
  3: 'ram',
  4: 'ram',
  5: 'mum',
  6: 'mum',
  7: 'ram',
  8: 'mah',
  9: 'mah',
  10: 'mah',
  11: 'No Entry',
  12: 'No Entry',
  13: 'No Entry',
  14: 'No Entry',
  15: 'mum',
  16: 'mum',
  17: 'No Entry',
  18: 'No Entry',
  19: 'ram',
  20: 'ram'},
 'sir': {0: 'a',
  1: 'a',
  2: 'a',
  3: 'a',
  4: 'b',
  5: 'b',
  6: 'b',
  7: 'b',
  8: 'c',
  9: 'c',
  10: 'c',
  11: 'c',
  12: 'c',
  13: 'a',
  14: 'b',
  15: 'a',
  16: 'b',
  17: 'a',
  18: 'b',
  19: 'a',
  20: 'b'},
 'sport_name': {0: 'football',
  1: 'football',
  2: 'football',
  3: 'football',
  4: 'football',
  5: 'football',
  6: 'football',
  7: 'football',
  8: 'basketball',
  9: 'basketball',
  10: 'basketball',
  11: 'basketball',
  12: 'basketball',
  13: 'football',
  14: 'football',
  15: 'football',
  16: 'football',
  17: 'football',
  18: 'football',
  19: 'football',
  20: 'football'}}
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-09-13 14:55:59

您可以做的是,在计算月份号(从日期开始)之后,以及在计算前一个月号之后,将数据放入自身。

让我们从计算这两个值开始。为了方便起见,我首先将原始的month字符串值转换为datetime,这允许我使用relativedelta计算前一个月。这确保了行为是正确的,即使在一年的变化之后。

代码语言:javascript
复制
In [7]: df['month'] = pd.to_datetime(df['month'])

In [8]: df['month_num'] = df['month'].apply(lambda x: x.strftime('%Y-%m'))

In [9]: from dateutil.relativedelta import relativedelta

In [10]: df['previous_month_num'] = df['month'].apply(lambda x: (x + relativedelta(months=-1)).strftime('%Y-%m'))

In [11]: df
Out[11]:
     city      month person_count person_name person_symbol sir  sport_name  \
0  mumbai 2017-01-23           10      ramesh           ram   a    football
1  mumbai 2017-01-23           14      ramesh           mum   a    football
2   delhi 2017-01-23           25      ramesh           mum   a    football
3   delhi 2017-01-23           20      ramesh           ram   a    football
4  mumbai 2017-02-22           34      ramesh           ram   b    football
5  mumbai 2017-02-22           23      ramesh           mum   b    football
6   delhi 2017-02-22           43      ramesh           mum   b    football
7   delhi 2017-02-22           34      ramesh           ram   b    football
8    pune 2017-03-03           10      mahesh           mah   c  basketball
9  nagpur 2017-03-03           20      mahesh           mah   c  basketball

  month_num previous_month_num
0   2017-01            2016-12
1   2017-01            2016-12
2   2017-01            2016-12
3   2017-01            2016-12
4   2017-02            2017-01
5   2017-02            2017-01
6   2017-02            2017-01
7   2017-02            2017-01
8   2017-03            2017-02
9   2017-03            2017-02

然后,我们可以使用计算出的月份值作为合并键,将dataframe合并到自身中:

代码语言:javascript
复制
In [12]: relevant_columns = ['city', 'person_symbol', 'sport_name']

In [13]: pd.merge(df, df, left_on=relevant_columns + ['previous_month_num'], right_on=rele
    ...: vant_columns + ['month_num'], how='left', suffixes=('', '_previous'))[list(df.col
    ...: umns) + ['person_count_previous']].fillna(0).drop(['month_num', 'previous_month_n
    ...: um'], axis=1)
Out[13]:
     city      month person_count person_name person_symbol sir  sport_name  \
0  mumbai 2017-01-23           10      ramesh           ram   a    football
1  mumbai 2017-01-23           14      ramesh           mum   a    football
2   delhi 2017-01-23           25      ramesh           mum   a    football
3   delhi 2017-01-23           20      ramesh           ram   a    football
4  mumbai 2017-02-22           34      ramesh           ram   b    football
5  mumbai 2017-02-22           23      ramesh           mum   b    football
6   delhi 2017-02-22           43      ramesh           mum   b    football
7   delhi 2017-02-22           34      ramesh           ram   b    football
8    pune 2017-03-03           10      mahesh           mah   c  basketball
9  nagpur 2017-03-03           20      mahesh           mah   c  basketball

  person_count_previous
0                     0
1                     0
2                     0
3                     0
4                    10
5                    14
6                    25
7                    20
8                     0
9                     0

一些评论:

  • 我使用['city', 'person_symbol', 'sport_name']作为参考列,但是可以随意添加更多,这取决于您到底想要实现什么。
  • 新列名为person_count_previous,但您可以rename它,如果它对您最好的话。
  • 默认情况下,当上一次计数没有匹配时,列将为NaN。多亏了fillna,我用0替换了值。
  • 我使用drop删除了“临时”列,但可以随时保留它们。
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46196090

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档