首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >np.select不与datetime64合作[ns]

np.select不与datetime64合作[ns]
EN

Stack Overflow用户
提问于 2022-11-04 21:19:45
回答 1查看 49关注 0票数 1

我有一个带有日期时间值和时区的序列。

如果我使用np.select来:

  • 如果在12 ->后1小时返回
  • 如果在11 ->返回前1小时
  • 其他返回np.nan

对于带有时区的datetime值,它可以工作。但是,如果在删除时区后使用np.select,则会出现以下错误:

代码语言:javascript
复制
TypeError: Choicelists and default value do not have a common dtype: The DType <class 'numpy.dtype[datetime64]'> could not be promoted by <class 'numpy.dtype[float64]'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtype[datetime64]'>, <class 'numpy.dtype[float64]'>)

这是我的代码:

代码语言:javascript
复制
import pandas as pd 
import numpy as np
from datetime import timedelta
import datetime

datetime_series = pd.Series(['2022-09-24 22:00:00+02:00','2022-09-04 11:30:00+02:00', '2022-11-11 02:20:30+02:00',  '2022-11-12 03:20:30+02:00'])
 #make datetime
datetime_series = pd.to_datetime(datetime_series, errors='coerce')
 #remove timezone
datetime_series_no_timezone = datetime_series.dt.tz_localize(None)

print ('datetime_series dtype: ', datetime_series.dtype)
print ('datetime_series_no_timezone dtype: ', datetime_series_no_timezone.dtype)

 # with timezone it works
conditions = [
        datetime_series.dt.hour > 12,
        datetime_series.dt.hour < 11]
choiches = [
        (datetime_series + datetime.timedelta(days=1)),
        datetime_series ]

print (np.select(conditions, choiches, default=np.nan))

 # without timezone it doesn't 
conditions = [
        datetime_series_no_timezone.dt.hour > 12,
        datetime_series_no_timezone.dt.hour < 11]
choiches = [
        (datetime_series_no_timezone + datetime.timedelta(days=1)),
        datetime_series_no_timezone ]
print (np.select(conditions, choiches, default=np.nan))

退出:

代码语言:javascript
复制
datetime_series dtype:  datetime64[ns, pytz.FixedOffset(120)]
datetime_series_no_timezone dtype:  datetime64[ns]
[Timestamp('2022-09-25 22:00:00+0200', tz='pytz.FixedOffset(120)') nan
 Timestamp('2022-11-11 02:20:30+0200', tz='pytz.FixedOffset(120)')
 Timestamp('2022-11-12 03:20:30+0200', tz='pytz.FixedOffset(120)')]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-63a5d209c9ba> in <module>
     30         (datetime_series_no_timezone + datetime.timedelta(days=1)),
     31         datetime_series_no_timezone ]
---> 32 print (np.select(conditions, choiches, default=np.nan))

<__array_function__ internals> in select(*args, **kwargs)

/usr/local/lib/python3.7/dist-packages/numpy/lib/function_base.py in select(condlist, choicelist, default)
    687     except TypeError as e:
    688         msg = f'Choicelists and default value do not have a common dtype: {e}'
--> 689         raise TypeError(msg) from None
    690 
    691     # Convert conditions to arrays and broadcast conditions and choices
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-11-05 08:23:47

使用适当的默认值来避免错误;在这里,您可以使用熊猫的非时间值:

代码语言:javascript
复制
import pandas as pd
import numpy as np

datetime_series = pd.Series(
    [
        "2022-09-24 22:00:00+02:00",
        "2022-09-04 11:30:00+02:00",
        "2022-11-11 02:20:30+02:00",
        "2022-11-12 03:20:30+02:00",
    ]
)
datetime_series = pd.to_datetime(datetime_series, errors="coerce")
datetime_series_no_timezone = datetime_series.dt.tz_localize(None)

conditions = [
    datetime_series_no_timezone.dt.hour > 12,
    datetime_series_no_timezone.dt.hour < 11,
]
choiches = [
    (datetime_series_no_timezone + pd.Timedelta(days=1)),
    datetime_series_no_timezone,
]

s = np.select(conditions, choiches, default=pd.NaT)
print(s)
# [1664143200000000000 NaT 1668133230000000000 1668223230000000000]

注意,从进程中的Unix时代开始,日期时间值被转换为整数纳秒。

至于为什么会发生这种情况,如果你看一下这个系列的numpy表示法,

代码语言:javascript
复制
datetime_series.to_numpy()
Out[26]: 
array([Timestamp('2022-09-24 22:00:00+0200', tz='pytz.FixedOffset(120)'),
       Timestamp('2022-09-04 11:30:00+0200', tz='pytz.FixedOffset(120)'),
       Timestamp('2022-11-11 02:20:30+0200', tz='pytz.FixedOffset(120)'),
       Timestamp('2022-11-12 03:20:30+0200', tz='pytz.FixedOffset(120)')],
      dtype=object)

datetime_series_no_timezone.to_numpy()
Out[27]: 
array(['2022-09-24T22:00:00.000000000', '2022-09-04T11:30:00.000000000',
       '2022-11-11T02:20:30.000000000', '2022-11-12T03:20:30.000000000'],
      dtype='datetime64[ns]')

在第一种情况下,由于pytz固定偏移量,numpy无法找到合适的dtype并使用“object”。在第二种情况下,numpy将datetime确定为dtype。我假设,当这些数组被传递给np.select时,在第一种情况下,不会试图强迫普通的dtype,因为它是对象(任何东西!)。在第二种情况下,这样的尝试失败了,np.nan是dtype float,而datetime是datetime,如果转换为datetime纳秒,则为int。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74323074

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档