文章/答案/技术大牛

发布

问用Pandas合并Excel文件
EN

Stack Overflow用户

提问于 2018-01-07 04:07:00

回答 2查看 2.5K关注 0票数 0

我正在用pandas合并多个excel文件，并且在下面得到一个回溯错误。我不太理解它，希望有人能帮我理解它。作业仍在完成，但在控制台中出错。这些文件都是xlsx文件，并且已经打开并重新保存为xlsx，以验证它不是格式问题。

Traceback (most recent call last):
File "/Users/Documents/Python Scripts/Merge Scan xlsx_copy.py", 
line 13, in <module>
df = pd.read_excel(f)
File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-
packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-
packages/pandas/io/excel.py", line 230, in read_excel
io = ExcelFile(io, engine=engine)
File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-
packages/pandas/io/excel.py", line 294, in __init__
self.book = xlrd.open_workbook(self._io)
File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-
packages/xlrd/__init__.py", line 162, in open_workbook
ragged_rows=ragged_rows,
File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-
packages/xlrd/book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-
packages/xlrd/book.py", line 1271, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-
packages/xlrd/book.py", line 1265, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF 
record; found b'\x15Microso'

正如我所说的，作业完成了，但是合并文件中的所有列都没有对齐。一些开始于A列，另一个开始于B列，另一个开始于E列。有人能告诉我为什么不从A列开始添加它吗?我的脚本如下：

import pandas as pd
import numpy as np
import glob
from sys import argv
script, file_location, outpath = argv

files = glob.glob(file_location +  "*.xlsx")
all_data = pd.DataFrame()

for f in files:
    df = pd.read_excel(f)
    all_data = all_data.append(df)
    all_data.to_excel(outpath + ".xlsx")

excel

python-3.x

pandas

回答 2

Stack Overflow用户

发布于 2018-01-07 06:12:49

对于第一个问题，我从具有各种不同文件格式的目录中读取数据。因此，它试图读取非xlsx的文件格式，这就是回溯错误的来源。一旦我将xlsx文件移到它自己的目录中，这个错误就消失了。

对于第二个问题，正在使用的文件的头在第1行以外的行上。对于这些文件，数据在第1-5行，头在第6行。一旦我删除了第1- 5行，文件就能够正确地合并。

票数 0

Stack Overflow用户

发布于 2018-01-20 09:51:43

这会做你想做的事。

import pandas as pd

# filenames
excel_names = ["C:/Users/Excel/Desktop/Test/Book1.xlsx", "C:/Users/Excel/Desktop/Test/Book2.xlsx", "C:/Users/Excel/Desktop/Test/Book3.xlsx"]

# read them in
excels = [pd.ExcelFile(name) for name in excel_names]

# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]

# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]

# concatenate them..
combined = pd.concat(frames)

# write it out
combined.to_excel("c.xlsx", header=False, index=False)


# Results go to the default directory if not assigned somewhere else.
# C:\Users\Excel\.spyder-py3

顺便说一句，您可以考虑使用下面链接中的AddIn。

https://www.rondebruin.nl/win/addins/rdbmerge.htm

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/48131408

复制

相似问题

问用Pandas合并Excel文件
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Pandas合并Excel文件EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Pandas合并Excel文件
EN