我必须跟踪数据

。我成功地删除了跨越年份2月29日的所有数据,因为我打算在“年份之日”专栏(使用.dt.dayofyear创建)上进行分组,我决定忽略跨越年份的额外一天。现在,为了按“一年中的一天”专栏进行分组,如果日期是3月1日或以后,我必须从跨越年份的日子中减去1。否则,跳跃年将有366天,而不是355天(即使在删除跳跃日之后)。
这里是我的代码:
clim_rec = pd.read_csv("daily_climate_records.csv")
clim_rec['Date'] = pd.to_datetime(clim_rec['Date']) # converting "Date" column from string into datetime format
# Let's drop all leaping days by masking all Feb 29 days
feb_29_mask = ~((clim_rec.Date.dt.month == 2) & (clim_rec.Date.dt.day == 29))
clim_rec = clim_rec.where(feb_29_mask).dropna()
# Let's add new column with the "day of year" in order to group by this column
clim_rec['Day of year'] = clim_rec['Date'].dt.dayofyear
print(clim_rec.head())
#print('---------------------------------------------------')
# Now, if the year is a leap year and the dayofyear is greater than the dayofyear of Feb-29
# we subtract 1 from dayofyear. After doing that we will get values 1-365 for dayofyear
leap_year_mask = (clim_rec.Date.dt.year % 4 == 0) & ((clim_rec.Date.dt.year % 100 != 0)
|(clim_rec.Date.dt.year % 400 == 0)) & (clim_rec.Date.dt.month >=3)
clim_rec['Day of year'] = clim_rec['Day of year'].apply(lambda x: x-1) # this line is not correct,我的问题是:如何修改我附加代码的最后一行,以便只将子字符串应用于特定的行,而这些行是真正符合布尔掩码条件的行。
发布于 2021-01-14 11:23:53
通过掩码选择行时使用DataFrame.loc,更好/更快的方法是用1减去apply以避免循环(因为在隐藏循环下应用):
clim_rec.loc[leap_year_mask, 'Day of year'] -= 1 工作方式如下:
clim_rec.loc[leap_year_mask, 'Day of year'] = clim_rec.loc[leap_year_mask, 'Day of year']-1发布于 2021-01-14 11:32:40
这对你有用吗?Kr.
clim_rec['mask'] = leaf_year_mask
clim_rec['Day of year'] = clim_rec.apply(lambda x: x['Day of year']-1 if x['mask'] else x['Day of year'])https://stackoverflow.com/questions/65718041
复制相似问题