我有一个几千个样本(X和y)的数据集,我想把它分成n个相等的部分,每个部分我想把它们分成训练/测试。据我所知,sklearn中的分层k-fold几乎就是我想要的,但它不会将每个块划分为训练/测试。
有没有其他函数可以帮我做到这一点?

发布于 2019-08-29 07:56:12
这对我很有效:
from random import shuffle
n_splits = 10
n_classes = 2
#Get each of the classes into their own list of samples
class_split_list = {}
for i in range(n_classes):
class_list = list(set(data.iloc[data.groupby(['normal']).groups[i]].sample_id.tolist()))
shuffle(class_list)
class_split_list[i] = np.array_split(class_list,n_splits)#create a dict of split chunks
stratified_sample_chunks = []
for i in range(n_splits):
class_chunks = []
for j in range(n_classes):
class_chunks.extend(class_split_list[j][i])#get split from current class
stratified_sample_chunks.append(class_chunks)
print(stratified_sample_chunks[0][:20])您可以将class_list = list(set(data.iloc[data.groupby(['normal']).groups[i]].sample_id.tolist()))更改为class_list = list(set(data.iloc[data.groupby(['Column_with_y_values']).groups[i]].index.tolist()))
发布于 2019-08-29 02:18:59
from sklearn.model_selection import train_test_split
n = 10
chunk_size = int(df.shape[0] / n) + 1
for i in range(n):
start = chunk_size * i
data = df.iloc[start: start + chunk_size]
X_data = data.drop(['target'], axis=1)
y_data = data['target']
X_train, X_test, y_train, y_test = train_test_split(X_data, y_data)https://stackoverflow.com/questions/57698032
复制相似问题