我有一个这样大的数据集,有几百万行。我想做点什么来快速地应用到这个数据中。
df value
10
-1
20
...
-3
-4
-50
12我想知道最有效的方法,以确定值是否大于0,值将* 2。如果值小于0,则值将*3。
df value
20
-3
40
...
-9
-12
-150
24我的剧本是
dff = df.value
for i in range(len(dff)):
if dff[i] > 0:
dff[i] = dff[i] * 2
elif dff[i] < 0:
dff[i] = dff[i] * 3发布于 2022-02-16 00:36:43
让s是:
s = pd.Series(np.random.randint(-10,11,10**6))最佳解决办法:
y = np.where(s > 0, s * 2, s * 3)计时:
CPU times: user 11.3 ms, sys: 2.21 ms, total: 13.5 ms
Wall time: 11.8 ms你的解决方案:
%%time
for i in range(len(s)):
if s[i] > 0:
s[i] = s[i] * 2
elif s[i] < 0:
s[i] = s[i] * 3计时:
CPU times: user 17.7 s, sys: 51.3 ms, total: 17.8 s
Wall time: 17.9 s另一种选择:
%%time
y = s.map(lambda x: x*2 if x>0 else x*3)计时:
CPU times: user 308 ms, sys: 37.5 ms, total: 345 ms
Wall time: 371 ms另一种选择:
%%time
mask = s>0
y = s.where(mask, s * 2).where(~mask, s * 3)计时:
CPU times: user 31 ms, sys: 7.43 ms, total: 38.4 ms
Wall time: 37.2 mshttps://stackoverflow.com/questions/71135043
复制相似问题