我正在尝试基于另一个Dataframe的排列来创建新的Dataframe。这是原始的数据帧。价格就是指数。
df1
Price Bid Ask
1 .01 .05
2 .04 .08
3 .1 .15
. . .
130 2.50 3.00第二个Dataframe用于从df1获取索引并创建一个Dataframe (df2),其中包含基于4个价格的df1索引的排列,如下面的示例输出所示。
df2
# price1 price2 price 3 price 4
1 1 2 3 4
2 1 2 3 5
3 1 2 3 6
.. .. .. .. ..为了实现这一点,我一直在使用itertools.permutation,但我遇到了内存问题,无法执行大量的排列。这是我用来做排列的代码。
price_combos = list(x for x in itertools.permutations(df1.index, 4))
df2 = pd.DataFrame(price_combos , columns=('price1', 'price2', 'price3', 'price4')) 发布于 2020-06-01 04:35:59
dtypes可能导致内存分配膨胀。对于你的场景,我找到的最好的办法就是将数据帧索引设置到一个具有int16数据类型的df1.index数组中。int8的数值范围是-128到128。由于您的索引是从0到130,因此int8不会执行suffice.
- Creating a `price_combos` variable and then a dataframe, will use twice the amount of memory, so create `df2` without the intermediary step.
- If you create the dataframe without specifying the `dtype`, as you're doing, the `dtype` will be `int64`
- With the following implementation, there will be one object, `df2`, that will be 2,180,905,112 Bytes
- With the original implementation, there would be two `int64` objects of 8GB each, for a total of 16GB.
import numpy as np
import pandas a pd
from itertools import permutations
# synthetic data set and create dataframe
np.random.seed(365)
data = {'Price': list(range(1, 131)),
'Bid': [np.random.randint(1, 10)*0.1 for _ in range(130)]}
df1 = pd.DataFrame(data)
df1['Ask'] = df1.Bid + 0.15
df1.set_index('Price', inplace=True)
# convert the index to an int16 array
values = df1.index.to_numpy(dtype='int16')
# create df2
%%time
df2 = pd.DataFrame(np.array(list(permutations(values, 4))), columns=('price1', 'price2', 'price3', 'price4'))
>>> Wall time: 2min 45s
print(df2.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 272613120 entries, 0 to 272613119
Data columns (total 4 columns):
# Column Dtype
--- ------ -----
0 price1 int16
1 price2 int16
2 price3 int16
3 price4 int16
dtypes: int16(4)
memory usage: 2.0 GBdf2.head()
price1 price2 price3 price4
0 1 2 3 4
1 1 2 3 5
2 1 2 3 6
3 1 2 3 7
4 1 2 3 8df2.tail()
price1 price2 price3 price4
272613115 130 129 128 123
272613116 130 129 128 124
272613117 130 129 128 125
272613118 130 129 128 126
272613119 130 129 128 127https://stackoverflow.com/questions/62119757
复制相似问题