我写了这个函数:
def time_to_unix(df,dateToday):
'''this function creates the timestamp column for the dataframe. it also gets today's date (ex: 2022-8-8 0:0:0)
and then it adds the seconds that were originally in the timestamp column.
input: dataframe, dateToday(type: pandas.core.series.Series)
output: list of times
'''
dateTime = dateToday[0]
times = []
for i in range(0,len(df['timestamp'])):
dateAndTime = dateTime + timedelta(seconds = float(df['timestamp'][i]))
unix = pd.to_datetime([dateAndTime]).astype(int) / 10**9
times.append(unix[0])
return times 所以它需要一个数据,它得到今天的日期,然后取数据中时间戳的值(以秒为单位,如10,20,.))然后应用该函数,并以unix时间返回时间。
但是,由于我的dataframe中有大约200万行,所以运行这段代码需要花费大量的时间。
如何使用lambda函数或其他东西来加快我的代码和流程。
沿着…线的东西:
df['unix'] = df.apply(lambda row : something in here), axis = 1)发布于 2022-08-10 11:38:23
我认为您会发现,大部分时间用于创建和操作dataframe中的datetime /time戳对象(更多信息请参见here )。我还试图避免在大型数据文件上使用这样的lambda,因为它们逐行进行,这是应该避免的。过去,我在处理日期时间/时间戳/时区更改时所做的是构建一个可能的日期时间组合的字典,然后使用map应用它们。就像这样:
import datetime as dt
import pandas as pd
#Make a time key column out of your date and timestamp fields
df['time_key'] = df['date'].astype(str) + '@' + df['timestamp']
#Build a dictionary from the unique time keys in the dataframe
time_dict = dict()
for time_key in df['time_key'].unique():
time_split = time_key.split('@')
#Create the Unix time stamp based on the values in the key; store it in the dictionary so it can be mapped later
time_dict[time_key] = (pd.to_datetime(time_split[0]) + dt.timedelta(seconds=float(time_split[1]))).astype(int) / 10**9
#Now map the time_key to the unix column in the dataframe from the dictionary
df['unix'] = df['time_key'].map(time_dict)注意,如果在dataframe中所有的datetime组合都是唯一的,这可能不会有帮助。
发布于 2022-08-10 13:05:08
我不太清楚dateTime[0]有什么类型。但是您可以尝试一种更矢量化的方法:
import pandas as pd
df["unix"] = (
(pd.Timestamp(dateTime[0]) + pd.to_timedelta(df["timestamp"], unit="seconds"))
.astype("int").div(10**9)
)或
df["unix"] = (
(dateTime[0] + pd.to_timedelta(df["timestamp"], unit="seconds"))
.astype("int").div(10**9)
)https://stackoverflow.com/questions/73305253
复制相似问题