我有一个包含很多excel文件的目录。我的目标是读取所有这些excel文件,并从中提取一些信息。我使用下面的脚本读取目录,但仍然收到错误。文件被识别了,但是代码告诉它不是在创建它们,这很奇怪,因为有一行代码打印文件的名称。但是,当阅读熊猫的文件时,它是不成立的。
/home/geta/kelo/eXP/Test/corpus
-----File in processed : corpus_or_AB_FMC.xlsx
Traceback (most recent call last):
File "test_vec.py", line 111, in <module>
sentences = pd.read_excel(file, sheet_name= 0)
File "/home/getalp/kelodjoe/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/home/geta/kelo/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 310, in read_excel
io = ExcelFile(io, engine=engine)
File "/home/geta/kelo/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 819, in __init__
self._reader = self._engines[engine](self._io)
File "/home/geta/kelo/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 21, in __init__
super().__init__(filepath_or_buffer)
File "/home/geta/kelo/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 359, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "/home/geta/kelo/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", line 36, in load_workbook
return open_workbook(filepath_or_buffer)
File "/home/geta/kelo/anaconda3/lib/python3.7/site-packages/xlrd/__init__.py", line 111, in open_workbook
with open(filename, "rb") as f:代码如下:
dir = "/home/geta/kelo/eXP/Test/corpus"
for root, subdirs, files in os.walk(dir):
print(root)
for file in files:
#print(files)
print("-----File in processed :", file)
# -----File in processed : corpus_or_AB_FMC.xlsx # this file si located in the corpus directory
sentences = pd.read_excel(file, sheet_name= 0)
data_id = sentences.identifiant
print("Total phrases: ", len(data_id))
data = sentences.verbatim
data_label = sentences.etiquette
#print(type(data_id))
#print(type(data))
#number = LabelEncoder()
# 0 = C; 1= F; 2= M
#data_label = number.fit_transform(sentences.etiquette.astype('str'))
#print(data_label)
print("etiquette :" , sentences['etiquette'].unique())
classes = sentences['etiquette'].unique()
len_classes = len(classes)发布于 2021-03-30 23:36:22
file是文件的名称,不包括该文件的路径。使用os.path.join
sentences = pd.read_excel(os.path.join(root, file), sheet_name=0)以连接文件名及其绝对路径。
https://stackoverflow.com/questions/66873968
复制相似问题