下面是我想要实现的一些虚拟代码,我的问题在最后。我想用Python在列表中混洗数据帧(不同大小)的块。谢谢。
设置一个虚拟字典:
dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
"Alphabet":["A","B","C","D","E","F","G","H","I","J"],
"Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}将字典转换为数据帧:
dummy_df = pd.DataFrame(dummy)创建所需大小的数据帧块:
blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
a = blocksize[j]
blocks.append(dummy_df[i:i+a])
i += a
blocks下面是“block”的输出。它是一个列表中大小为1-4行的4个数据帧块:
[ ID Alphabet Fruit
0 1 A apple,
ID Alphabet Fruit
1 2 B banana
2 3 C coconut,
ID Alphabet Fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa,
ID Alphabet Fruit
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit]我被困在上面的事情之后。
我尝试了很多不同的方法,但总是出错。我想在列表中打乱这些数据帧的块,然后将它们组合回一个数据帧。下面是一个混洗输出的例子。我该怎么做呢?
理想输出示例:
ID Alphabet Fruit
1 2 B banana
2 3 C coconut
0 1 A apple
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa发布于 2020-08-26 02:46:24
有了列表之后,您可以使用random.shuffle对块进行混洗。在此之后,您可以创建一个新的空数据帧,然后附加(随机)列表中的每个块。
尝试以下代码:
import pandas as pd
import random
dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
"Alphabet":["A","B","C","D","E","F","G","H","I","J"],
"Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}
dummy_df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
a = blocksize[j]
blocks.append(dummy_df[i:i+a])
i += a
random.shuffle(blocks) # shuffle blocks in list
dfs = pd.DataFrame() # new empty dataframe
for b in blocks: # each block
dfs = dfs.append(b) # add to dataframe
print(dfs)输出
ID Alphabet Fruit
3 4 D date
4 5 E elephant apple
5 6 F feijoa
1 2 B banana
2 3 C coconut
6 7 G guava
7 8 H honeydew
8 9 I ita palm
9 10 J jack fruit
0 1 A apple发布于 2020-08-26 03:12:49
您可以使用.sample(frac=1)直接在数据帧中混洗数据
blocks.append( df[start:end].sample(frac=1) )稍后,您可以使用df.append(list_of_df)一次性加入所有dataframes。
df = blocks[0].append(blocks[1:])import pandas as pd
dummy = {
"ID": [1,2,3,4,5,6,7,8,9,10],
"Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
"Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}
df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
start = 0
for size in blocksize:
end = start + size
blocks.append(df[start:end].sample(frac=1))
start = end
#for item in blocks:
# print(item)
df = blocks[0].append(blocks[1:]) # .reset_index(drop=True)
print(df)其他混洗方法:Shuffle DataFrame rows
另一种想法是使用.sample(frac=1)仅获得随机排列的索引
blocks += df[start:end].sample(frac=1).index.tolist()或random.shuffle()
indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes然后使用这些索引创建新的DataFrame
df = df.iloc[blocks]import pandas as pd
import random
dummy = {
"ID": [1,2,3,4,5,6,7,8,9,10],
"Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
"Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}
df = pd.DataFrame(dummy)
blocksize = [1,2,3,4]
blocks = []
start = 0
for size in blocksize:
end = start + size
#blocks += df[start:end].sample(frac=1).index.tolist()
indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes
start = end
#for item in blocks:
# print(item)
df = df.iloc[blocks]
print(df)https://stackoverflow.com/questions/63584271
复制相似问题