我目前正在使用openpyxl来使用一个excel文件,该文件具有多个索引(两个级别的头),并且我试图根据标题的子头来执行操作。我有一些经验。在熊猫身上这样做,但在这个项目中,我必须使用openpyxl,这是我以前经常使用的。我唯一能想到的就是手动方式:
我的代码如下:
#reading the excel file
path = r'path to file'
wb = load_workbook(path) #loading the excel table
ws = wb.active #grab the active worksheet
#Setting the doc Header
for h in ws.iter_rows(max_row = 1, values_only = True): #getting the first row (Headers) in the table
header = list(h)
for sh in ws.iter_rows(min_row = 1 ,max_row = 2, values_only = True):
sub_header = list(sh)
#removing all of the none Values
header = list(filter(None, header))
sub_header = list(filter(None, sub_header))
print(header)
print(sub_header)
#creating a list of all the Columns in the excel file
col_list = []
for col in ws.iter_cols(min_row=3,min_col = 1): #Iteration over every single row starting from the third row since first two are the headers
col = [cell.value for cell in col] #Creating a list from each row
col = list(filter(None, col)) #removing the none values from each row
col_list.append(col) #creating a list of all rows (starting from the 3d one)
#print (col_list)但我确信肯定有一个更好的方法,我无法找到在文档或通过检查这个网站。
提前感谢!
最后,我的目标是通过迭代头来自动化代码的这一部分,并在每次代码时使用该头的子头及其值:
#bulding the templates using yattag "yattag.org"
doc , tag , text = Doc().tagtext()
#building the tags of the xml file
with tag("Data"): #root tag
for row in row_list :
with tag("Row"):
with tag("Input"):
with tag(header[0].replace(' ','_').replace('\n','_')):
text("In " + dic[row[0]]+" the precentage of Students " + " regarding the " + header[0] + " the Precentage of Students with "+ sub_header[0] + " is "+ str(row[1]) + " whereas the " + sub_header[1] + " are " + str(row[2]) )
with tag("Row_Data"):
text(dic[row[0]] + " | " + header[0] + " | " + sub_header[0]+ " | " + str(row[1]) + " | " + sub_header[1] + " | " + str(row[2]))
with tag(header[1].replace(' ','_').replace('\n','_')):
text("In " + dic[row[0]]+" the precentage of Students " + " regarding the " + header[1] + " the Precentage of Students with "+ sub_header[2] + " is "+ str(row[3]) + " whereas the " + sub_header[3] + " are " + str(row[4]) )
with tag("Row_Data"):
text(dic[row[0]] + " | " + header[1] + " | " + sub_header[2]+ " | " + str(row[3]) + " | " + sub_header[3] + " | " + str(row[4]))
with tag(header[2].replace(' ','_').replace('\n','_')):
text("In " + dic[row[0]]+" the precentage of Students " + " regarding the " + header[2] + " the Precentage of Students with "+ sub_header[4] + " is "+ str(row[5]) + " whereas the " + sub_header[5] + " are " + str(row[6]) )
with tag("Row_Data"):
text(dic[row[0]] + " | " + header[2] + " | " + sub_header[4]+ " | " + str(row[5]) + " | " + sub_header[5] + " | " + str(row[6]))
with tag(header[3].replace(' ','_').replace('\n','_')):
text("In " + dic[row[0]]+" the precentage of Students " + " regarding the " + header[3] + " the Precentage of Students with "+ sub_header[6] + " is "+ str(row[7]) + " whereas the " + sub_header[7] + " are " + str(row[8]) +" and for " + sub_header[8] + str(row[9]) )
with tag("Row_Data"):
text(dic[row[0]] + " | " + header[3] + " | " + sub_header[6]+ " | " + str(row[7]) + " | " + sub_header[7] + " | " + str(row[8]) + " | " + sub_header[8] + " | " + str(row[9]))
with tag(header[4].replace(' ','_').replace('\n','_')):
text("In " + dic[row[0]]+" the precentage of Students " + " regarding the " + header[4] + " the Precentage of Students with "+ sub_header[9] + " is "+ str(row[10]) + " whereas the " + sub_header[10] + " are " + str(row[11]) )
with tag("Row_Data"):
text(dic[row[0]] + " | " + header[4] + " | " + sub_header[9]+ " | " + str(row[10]) + " | " + sub_header[10] + " | " + str(row[11]))
with tag(header[5].replace(' ','_').replace('\n','_')):
text("In " + dic[row[0]]+" the precentage of Students " + " regarding the " + header[5] + " the Precentage of Students with "+ sub_header[11] + " is "+ str(row[12]) + " whereas the " + sub_header[12] + " are " + str(row[13]) )
with tag("Row_Data"):
text(dic[row[0]] + " | " + header[5] + " | " + sub_header[11]+ " | " + str(row[12]) + " | " + sub_header[12] + " | " + str(row[13]))
with tag(header[6].replace(' ','_').replace('\n','_')):
text("In " + dic[row[0]]+" the precentage of Students " + " regarding the " + header[6] + " the Precentage of Students with "+ sub_header[13] + " is "+ str(row[14]) + " whereas the " + sub_header[14] + " are " + str(row[15]) )
with tag("Row_Data"):
text(dic[row[0]] + " | " + header[6] + " | " + sub_header[13]+ " | " + str(row[14]) + " | " + sub_header[14] + " | " + str(row[15]))
#print(doc.getvalue())
result = indent(
doc.getvalue(),
indentation=' ',
indent_text=True
)
#saving the xml file
with open("output.xml", "w") as f:
f.write(result)发布于 2021-11-22 08:40:35
除非潘达完全不在餐桌上,否则我想你也许能做些熊猫和开百叶草的事情。文档提到将openpyxl中的数据读取到熊猫的dataframe:与Pandas和Numpy合作中。你能用:
data = ws.values
df = DataFrame(data[2:,:], index=data[0], columns=data[1])可能需要对None值进行一些必要的筛选。
https://stackoverflow.com/questions/70055488
复制相似问题