首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在Python中对列表中的数据帧块(不同大小)进行混洗?

如何在Python中对列表中的数据帧块(不同大小)进行混洗?
EN

Stack Overflow用户
提问于 2020-08-26 01:35:41
回答 2查看 136关注 0票数 0

下面是我想要实现的一些虚拟代码,我的问题在最后。我想用Python在列表中混洗数据帧(不同大小)的块。谢谢。

设置一个虚拟字典:

代码语言:javascript
复制
dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
         "Alphabet":["A","B","C","D","E","F","G","H","I","J"],
         "Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}

将字典转换为数据帧:

代码语言:javascript
复制
dummy_df = pd.DataFrame(dummy)

创建所需大小的数据帧块:

代码语言:javascript
复制
blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
    a = blocksize[j]
    blocks.append(dummy_df[i:i+a])
    i += a
blocks

下面是“block”的输出。它是一个列表中大小为1-4行的4个数据帧块:

代码语言:javascript
复制
[   ID Alphabet  Fruit
 0   1        A  apple,    
ID Alphabet    Fruit
 1   2        B   banana
 2   3        C  coconut,    
ID Alphabet           Fruit
 3   4        D            date
 4   5        E  elephant apple
 5   6        F          feijoa,    
ID Alphabet       Fruit
 6   7        G       guava
 7   8        H    honeydew
 8   9        I    ita palm
 9  10        J  jack fruit]

我被困在上面的事情之后。

我尝试了很多不同的方法,但总是出错。我想在列表中打乱这些数据帧的块,然后将它们组合回一个数据帧。下面是一个混洗输出的例子。我该怎么做呢?

理想输出示例:

代码语言:javascript
复制
    ID  Alphabet    Fruit
1   2   B   banana
2   3   C   coconut
0   1   A   apple
6   7   G   guava
7   8   H   honeydew
8   9   I   ita palm
9   10  J   jack fruit
3   4   D   date
4   5   E   elephant apple
5   6   F   feijoa
EN

回答 2

Stack Overflow用户

发布于 2020-08-26 02:46:24

有了列表之后,您可以使用random.shuffle对块进行混洗。在此之后,您可以创建一个新的空数据帧,然后附加(随机)列表中的每个块。

尝试以下代码:

代码语言:javascript
复制
import pandas as pd
import random

dummy = {"ID":[1,2,3,4,5,6,7,8,9,10],
         "Alphabet":["A","B","C","D","E","F","G","H","I","J"],
         "Fruit":["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]}

dummy_df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []
i = 0
for j in range(len(blocksize)):
    a = blocksize[j]
    blocks.append(dummy_df[i:i+a])
    i += a

random.shuffle(blocks)  # shuffle blocks in list

dfs = pd.DataFrame()  # new empty dataframe

for b in blocks: # each block 
   dfs = dfs.append(b) # add to dataframe
   
print(dfs)

输出

代码语言:javascript
复制
   ID Alphabet           Fruit
3   4        D            date
4   5        E  elephant apple
5   6        F          feijoa
1   2        B          banana
2   3        C         coconut
6   7        G           guava
7   8        H        honeydew
8   9        I        ita palm
9  10        J      jack fruit
0   1        A           apple
票数 0
EN

Stack Overflow用户

发布于 2020-08-26 03:12:49

您可以使用.sample(frac=1)直接在数据帧中混洗数据

代码语言:javascript
复制
blocks.append( df[start:end].sample(frac=1) )

稍后,您可以使用df.append(list_of_df)一次性加入所有dataframes

代码语言:javascript
复制
df = blocks[0].append(blocks[1:])

代码语言:javascript
复制
import pandas as pd

dummy = {
    "ID": [1,2,3,4,5,6,7,8,9,10],
    "Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
    "Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}

df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []

start = 0
for size in blocksize:
    end = start + size
    blocks.append(df[start:end].sample(frac=1))
    start = end

#for item in blocks:
#    print(item)

df = blocks[0].append(blocks[1:]) # .reset_index(drop=True)
print(df)

其他混洗方法:Shuffle DataFrame rows

文档:pandas.DataFrame.sample

另一种想法是使用.sample(frac=1)仅获得随机排列的索引

代码语言:javascript
复制
blocks += df[start:end].sample(frac=1).index.tolist()

random.shuffle()

代码语言:javascript
复制
indexes = df[start:end].index.tolist()
random.shuffle(indexes)
blocks += indexes

然后使用这些索引创建新的DataFrame

代码语言:javascript
复制
df = df.iloc[blocks]

代码语言:javascript
复制
import pandas as pd
import random

dummy = {
    "ID": [1,2,3,4,5,6,7,8,9,10],
    "Alphabet": ["A","B","C","D","E","F","G","H","I","J"],
    "Fruit": ["apple","banana","coconut","date","elephant apple","feijoa","guava","honeydew","ita palm","jack fruit"]
}

df = pd.DataFrame(dummy)

blocksize = [1,2,3,4]
blocks = []

start = 0
for size in blocksize:
    end = start + size

    #blocks += df[start:end].sample(frac=1).index.tolist()
   
    indexes = df[start:end].index.tolist()
    random.shuffle(indexes)
    blocks += indexes
    
    start = end

#for item in blocks:
#    print(item)

df = df.iloc[blocks]

print(df)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63584271

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档