我正在处理文件夹中的文件,在那里我需要更好的方法来循环浏览文件,并添加一列来制作主文件。对于两个文件,我使用读取作为两个数据帧和追加序列。然而,现在我遇到了超过100个文件的情况。文件1如下所示:
Num Department Product Salesman Location rating1
1 Electronics TV 3 Bigmart, Delhi 5
2 Electronics TV 1 Bigmart, Mumbai 4
3 Electronics TV 2 Bigmart, Bihar 3
4 Electronics TV 2 Bigmart, Chandigarh 5
5 Electronics Camera 2 Bigmart, Jharkhand 5
similary file 2:
Num Department Product Salesman Location rating2
1 Electronics TV 3 Bigmart, Delhi 2
2 Electronics TV 1 Bigmart, Mumbai 4
3 Electronics TV 2 Bigmart, Bihar 4
4 Electronics TV 2 Bigmart, Chandigarh 5
5 Electronics Camera 2 Bigmart, Jharkhand 3我试图实现的是从所有其他文件读取评级列,并添加垂直。期望值:
Num Department Product Salesman Location rating1 rating2
1 Electronics TV 3 Bigmart, Delhi 5 2
2 Electronics TV 1 Bigmart, Mumbai 4 4
3 Electronics TV 2 Bigmart, Bihar 3 5
4 Electronics TV 2 Bigmart, Chandigarh 5 5
5 Electronics Camera 2 Bigmart, Jharkhand 5 3我修改了这里发布的一些代码。以下代码有效:
def read_folder(folder):
files = [i for i in os.listdir(folder) if 'xlsx' in i]
df = pd.read_excel(folder+'/{}'.format(files[0]))
for f in files[1:]:
df2 = pd.read_excel(folder+'/{}'.format(f))
df = df.merge(df2.iloc[:,5],left_index=True,right_index=True)
return df发布于 2020-07-23 08:01:12
此方法读取文件夹并返回pandas数据帧中的所有内容
import pandas as pd
import os
def read_folder(csv_folder)
files = os.listdir(csv_folder)
df = []
for f in files:
print(f)
csv_file = csv_folder + "/" + f
df.append(pd.read_csv(csv_file))
df_full = pd.concat(df, ignore_index=True)
return df, full据我所知,你的上一个评论,你需要添加评分列并创建一个文件。读完所有文件后,你可以进行下面的操作。
final_df = df[0]
i = 1
for d in df[1:]:
final_df["rating_"+i] = d["rating"]
i = i+1发布于 2020-07-23 08:18:11
此版本的read_folder()返回数据帧列表。它还添加了一个帮助器列(用于评级)。
import pandas as pd
from pathlib import Path
def read_folder(csv_folder):
''' Input is a folder with csv files; return list of data frames.'''
csv_folder = Path(csv_folder).absolute()
csv_files = [f for f in csv_folder.iterdir() if f.name.endswith('csv')]
# the assign() method adds a helper column
dfs = [
pd.read_csv(csv_file).assign(rating_src = f'rating-{idx}')
for idx, csv_file in enumerate(csv_files, 1)
]
return dfs现在将数据框组装成所需的形状:
dfs = read_folder(csv_folder)
dfs = (pd.concat((d for d in dfs))
.set_index(['Num', 'Department', 'Product', 'Salesman', 'Location', 'rating_src'])
.squeeze()
.unstack(level='rating_src')
.reset_index()
)
dfs.columns.name = ''https://stackoverflow.com/questions/63044756
复制相似问题