我有一个带有日期时间值和时区的序列。
如果我使用np.select来:
对于带有时区的datetime值,它可以工作。但是,如果在删除时区后使用np.select,则会出现以下错误:
TypeError: Choicelists and default value do not have a common dtype: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[datetime64]'>, <class 'numpy.dtype[float64]'>)这是我的代码:
import pandas as pd
import numpy as np
from datetime import timedelta
import datetime
datetime_series = pd.Series(['2022-09-24 22:00:00+02:00','2022-09-04 11:30:00+02:00', '2022-11-11 02:20:30+02:00', '2022-11-12 03:20:30+02:00'])
#make datetime
datetime_series = pd.to_datetime(datetime_series, errors='coerce')
#remove timezone
datetime_series_no_timezone = datetime_series.dt.tz_localize(None)
print ('datetime_series dtype: ', datetime_series.dtype)
print ('datetime_series_no_timezone dtype: ', datetime_series_no_timezone.dtype)
# with timezone it works
conditions = [
datetime_series.dt.hour > 12,
datetime_series.dt.hour < 11]
choiches = [
(datetime_series + datetime.timedelta(days=1)),
datetime_series ]
print (np.select(conditions, choiches, default=np.nan))
# without timezone it doesn't
conditions = [
datetime_series_no_timezone.dt.hour > 12,
datetime_series_no_timezone.dt.hour < 11]
choiches = [
(datetime_series_no_timezone + datetime.timedelta(days=1)),
datetime_series_no_timezone ]
print (np.select(conditions, choiches, default=np.nan))退出:
datetime_series dtype: datetime64[ns, pytz.FixedOffset(120)]
datetime_series_no_timezone dtype: datetime64[ns]
[Timestamp('2022-09-25 22:00:00+0200', tz='pytz.FixedOffset(120)') nan
Timestamp('2022-11-11 02:20:30+0200', tz='pytz.FixedOffset(120)')
Timestamp('2022-11-12 03:20:30+0200', tz='pytz.FixedOffset(120)')]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-63a5d209c9ba> in <module>
30 (datetime_series_no_timezone + datetime.timedelta(days=1)),
31 datetime_series_no_timezone ]
---> 32 print (np.select(conditions, choiches, default=np.nan))
<__array_function__ internals> in select(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/numpy/lib/function_base.py in select(condlist, choicelist, default)
687 except TypeError as e:
688 msg = f'Choicelists and default value do not have a common dtype: {e}'
--> 689 raise TypeError(msg) from None
690
691 # Convert conditions to arrays and broadcast conditions and choices发布于 2022-11-05 08:23:47
使用适当的默认值来避免错误;在这里,您可以使用熊猫的非时间值:
import pandas as pd
import numpy as np
datetime_series = pd.Series(
[
"2022-09-24 22:00:00+02:00",
"2022-09-04 11:30:00+02:00",
"2022-11-11 02:20:30+02:00",
"2022-11-12 03:20:30+02:00",
]
)
datetime_series = pd.to_datetime(datetime_series, errors="coerce")
datetime_series_no_timezone = datetime_series.dt.tz_localize(None)
conditions = [
datetime_series_no_timezone.dt.hour > 12,
datetime_series_no_timezone.dt.hour < 11,
]
choiches = [
(datetime_series_no_timezone + pd.Timedelta(days=1)),
datetime_series_no_timezone,
]
s = np.select(conditions, choiches, default=pd.NaT)
print(s)
# [1664143200000000000 NaT 1668133230000000000 1668223230000000000]注意,从进程中的Unix时代开始,日期时间值被转换为整数纳秒。
至于为什么会发生这种情况,如果你看一下这个系列的numpy表示法,
datetime_series.to_numpy()
Out[26]:
array([Timestamp('2022-09-24 22:00:00+0200', tz='pytz.FixedOffset(120)'),
Timestamp('2022-09-04 11:30:00+0200', tz='pytz.FixedOffset(120)'),
Timestamp('2022-11-11 02:20:30+0200', tz='pytz.FixedOffset(120)'),
Timestamp('2022-11-12 03:20:30+0200', tz='pytz.FixedOffset(120)')],
dtype=object)
datetime_series_no_timezone.to_numpy()
Out[27]:
array(['2022-09-24T22:00:00.000000000', '2022-09-04T11:30:00.000000000',
'2022-11-11T02:20:30.000000000', '2022-11-12T03:20:30.000000000'],
dtype='datetime64[ns]')在第一种情况下,由于pytz固定偏移量,numpy无法找到合适的dtype并使用“object”。在第二种情况下,numpy将datetime确定为dtype。我假设,当这些数组被传递给np.select时,在第一种情况下,不会试图强迫普通的dtype,因为它是对象(任何东西!)。在第二种情况下,这样的尝试失败了,np.nan是dtype float,而datetime是datetime,如果转换为datetime纳秒,则为int。
https://stackoverflow.com/questions/74323074
复制相似问题