文章/答案/技术大牛

发布

问循环到lambda函数的转换
EN

Stack Overflow用户

提问于 2022-08-10 11:11:22

回答 2查看 119关注 0票数 0

我写了这个函数：

def time_to_unix(df,dateToday):
    '''this function creates the timestamp column for the dataframe. it also gets today's date (ex: 2022-8-8 0:0:0)
        and then it adds the seconds that were originally in the timestamp column.

        input: dataframe, dateToday(type: pandas.core.series.Series)
        output: list of times
    '''

    dateTime = dateToday[0]
    times = []

    for i in range(0,len(df['timestamp'])):
        dateAndTime = dateTime + timedelta(seconds = float(df['timestamp'][i]))
        unix = pd.to_datetime([dateAndTime]).astype(int) / 10**9
        times.append(unix[0])
    return times

所以它需要一个数据，它得到今天的日期，然后取数据中时间戳的值(以秒为单位，如10,20，.))然后应用该函数，并以unix时间返回时间。

但是，由于我的dataframe中有大约200万行，所以运行这段代码需要花费大量的时间。

如何使用lambda函数或其他东西来加快我的代码和流程。

沿着…线的东西：

df['unix'] = df.apply(lambda row : something in here), axis = 1)

python

dataframe

回答 2

Stack Overflow用户

发布于 2022-08-10 11:38:23

我认为您会发现，大部分时间用于创建和操作dataframe中的datetime /time戳对象(更多信息请参见here )。我还试图避免在大型数据文件上使用这样的lambda，因为它们逐行进行，这是应该避免的。过去，我在处理日期时间/时间戳/时区更改时所做的是构建一个可能的日期时间组合的字典，然后使用map应用它们。就像这样：

import datetime as dt
import pandas as pd


#Make a time key column out of your date and timestamp fields
df['time_key'] = df['date'].astype(str) + '@' + df['timestamp']

#Build a dictionary from the unique time keys in the dataframe
time_dict = dict()
for time_key in df['time_key'].unique():
    time_split = time_key.split('@')
    #Create the Unix time stamp based on the values in the key; store it in the dictionary so it can be mapped later
    time_dict[time_key] = (pd.to_datetime(time_split[0]) + dt.timedelta(seconds=float(time_split[1]))).astype(int) / 10**9

#Now map the time_key to the unix column in the dataframe from the dictionary
df['unix'] = df['time_key'].map(time_dict)

注意，如果在dataframe中所有的datetime组合都是唯一的，这可能不会有帮助。

票数 0

Stack Overflow用户

发布于 2022-08-10 13:05:08

我不太清楚dateTime[0]有什么类型。但是您可以尝试一种更矢量化的方法：

import pandas as pd

df["unix"] = (
    (pd.Timestamp(dateTime[0]) + pd.to_timedelta(df["timestamp"], unit="seconds"))
    .astype("int").div(10**9)
)

或

df["unix"] = (
    (dateTime[0] + pd.to_timedelta(df["timestamp"], unit="seconds"))
    .astype("int").div(10**9)
)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73305253

复制

相似问题

问循环到lambda函数的转换
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问循环到lambda函数的转换EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问循环到lambda函数的转换
EN