我有以下名为reviews.csv的csv数据
Movie,Reviewer,Sentence,Tag,Sentiment,Text,
Jaws,John,s1,Plot,Positive,The plot was great,
Jaws,Mary,s1,Plot,Positive,The plot was great,
Jaws,John,s2,Acting,Positive,The acting was OK,
Jaws,Mary,s2,Acting,Neutral,The acting was OK,
Jaws,John,s3,Scene,Positive,The visuals blew me away,
Jaws,Mary,s3,Effects,Positive,The visuals blew me away,
Vertigo,John,s1,Scene,Negative,The scenes were terrible,
Vertigo,Mary,s1,Acting,Negative,The scenes were terrible,
Vertigo,John,s2,Plot,Negative,The actors couldn’t make the story believable,
Vertigo,Mary,s2,Acting,Positive,The actors couldn’t make the story believable,
Vertigo,John,s3,Effects,Negative,The effects were awful,
Vertigo,Mary,s3,Effects,Negative,The effects were awful,我的目标是将这个csv文件转换成带有条件格式的Excel电子表格。具体而言,我要适用以下规则:
因此,我想创建一个Excel电子表格(.xlsx),如下所示:

我一直在查看Pandas的样式文档,以及关于XlsxWriter的条件格式教程,但我似乎不能把它们都放在一起。到目前为止我的情况是这样的。我可以将csv读入Pandas,对其进行排序(虽然我不确定这是否必要),并将其写回Excel电子表格。如何进行条件格式设置,以及在代码中的位置?
def csv_to_xls(source_path, dest_path):
"""
Convert a csv file to a formatted xlsx spreadsheet
Input: path to hospital review csv file
Output: formatted xlsx spreadsheet
"""
#Read the source file and convert to Pandas dataframe
df = pd.read_csv(source_path)
#Sort by Filename, then by sentence number
df.sort_values(['File', 'Sent'], ascending=[True, True], inplace = True)
#Create the xlsx file that we'll be writing to
orig = pd.ExcelWriter(dest_path, engine='xlsxwriter')
#Convert the dataframe to Excel, create the sheet
df.to_excel(orig, index=False, sheet_name='report')
#Variables for the workbook and worksheet
workbook = orig.book
worksheet = orig.sheets['report']
#Formatting for exact, partial, mismatch, gold
exact = workbook.add_format({'bg_color':'#B7F985'}) #green
partial = workbook.add_format({'bg_color':'#D3F6F4'}) #blue
mismatch = workbook.add_format({'bg_color':'#F6D9D3'}) #red
#Do the conditional formatting somehow
orig.save()发布于 2017-07-01 21:29:46
免责声明:我是图书馆的作者之一,我将建议
使用StyleFrame和DataFrame.duplicated可以很容易地实现这一点。
from styleframe import StyleFrame, Styler
sf = StyleFrame(df)
green = Styler(bg_color='#B7F985')
blue = Styler(bg_color='#D3F6F4')
red = Styler(bg_color='#F6D9D3')
sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence'], keep=False)],
styler_obj=red)
sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag'], keep=False)],
styler_obj=blue)
sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag', 'Sentiment'],
keep=False)],
styler_obj=green)
sf.to_excel('test.xlsx').save()这将产生以下结果:

https://stackoverflow.com/questions/44814798
复制相似问题