尝试为ML算法编码周期性特征,其中时间戳特征是非常重要的特征。
我想要将日期(cyclic_df的‘day_in_month’列)转换为一个循环变量,这样一个月的1号就在前一个月的最后一天之后。所以01。2月(01.02)更接近1月31日(31.01),因此,如果只考虑日期列,2天之间的差异是1,而不是30!
# Transform the cyclical features
cyclic_df['min_sin'] = np.sin(cyclic_df.minute*(2.*np.pi/59)) # Sinus component of minute
cyclic_df['min_cos'] = np.cos(cyclic_df.minute*(2.*np.pi/59)) # Cosinus component of minute
cyclic_df['hr_sin'] = np.sin(cyclic_df.hour*(2.*np.pi/23)) # Sinus component of hour
cyclic_df['hr_cos'] = np.cos(cyclic_df.hour*(2.*np.pi/23)) # Cosinus component of hour
cyclic_df['d_sin'] = np.sin(cyclic_df.day*(2.*np.pi/30)) # !component of day!! Help here
cyclic_df['d_cos'] = np.cos(cyclic_df.day*(2.*np.pi/30)) # !!!Cosinus component of day!!! Help here
cyclic_df['mnth_sin'] = np.sin((cyclic_df.month-1)*(2.*np.pi/12)) # Sinus component of minute
cyclic_df['mnth_cos'] = np.cos((cyclic_df.month-1)*(2.*np.pi/12)) # Cosinus component of minute问题出在我除以的那30。不是每个月都有30天,有几个月有30天、31天、28天或29天。在cyclical_df的每一行中,我有一列'month‘、一列'year’和一列'day‘。因此,从理论上讲,应该有一个解决方案来读取给定月份的正确天数。我如何用正确的变量替换30 (上面代码中的第5行和第6行),以便它从其他列读取年和月,并替换为正确的值,而不总是30?
PS:这将是非常好的,如果有人可以告诉我,如果我做了正确的分钟,小时和月份,也可以在上面的代码。
(在评论之后):是的,我有一个'year‘列。并将这两行更改为:
cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))我得到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-575-532a308075e2> in <module>()
11 #cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/30)) # Cosinus component of day
12
---> 13 cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
14 cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
15
~/anaconda/lib/python3.6/calendar.py in monthrange(year, month)
120 """Return weekday (0-6 ~ Mon-Sun) and number of days (28-31) for
121 year, month."""
--> 122 if not 1 <= month <= 12:
123 raise IllegalMonthError(month)
124 day1 = weekday(year, month, 1)
~/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().发布于 2018-12-07 04:37:08
如果数据中包含年和月,则可以使用calendar.monthrange
from calendar import monthrange
month = 2
year = 2014
_, mr = monthrange(year, month)
cyclic_df['d_cos'] = np.cos(cyclic_df.day*(2.*np.pi/mr))发布于 2018-12-07 04:38:01
https://stackoverflow.com/questions/53659075
复制相似问题