我使用python 3.7.5
我有一个CSV文件,我从out Jira实例中获得该文件,以查看哪个问题在哪个sprint中完成。Jira跟踪问题所在的每个sprint,所以如果您导出一个CSV,您将得到多个Sprint头,数据如下所示:
Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint
OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19
OLS-871,Story,Done,Sprint #18,Sprint #28,,
OLS-165,Story,Done,Sprint 1,Sprint 3,Sprint #18,Sprint #19
OLS-868,Story,Done,Sprint #28,,,我需要的是识别问题所在的 sprint ,所以大多数Sprint列都是正确的,这样我就可以计算出在每个sprint中实际完成了多少个问题。
我尝试使用默认的python 'csv‘和DictReader,如下所示:
import csv
with open('../OLS-tix2.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['Sprint'])但是,只有最后一个Sprint列和空格(如果该列中没有任何内容)。由于上述输出如下所示:
Sprint #19
Sprint #19我可以使用普通的csv读取器并使用自己的版本,但我认为在python中必须有更好的方法来实现这一点。
发布于 2020-02-07 07:27:20
好的,所以我做了更多的环顾四周,并偶然发现熊猫,这看起来可能是一个很好的工具,为这项工作。这里有很多例子,而且作为一个额外的好处,我可以使用数据帧、表格和ids的汇总计数进行旋转/旋转。
这就是我最后为自己工作的结果:
import pandas as pd
csv_file = "../OLS-tix.csv" # where the file is at
ols_df = pd.read_csv(csv_file)
finish_sprint_col = 'finish_sprint' # the column to put the actual Sprint thie issue was finished in
ols_df[finish_sprint_col] = "" # add the new blank column
sprints = ols_df.columns[ols_df.columns.str.contains('Sprint')] # get all the headers that contain the word sprint as they will be Sprint, Sprint.1 ... Sprint.N
for i,row in ols_df.iterrows():
if not ols_df.at[i,"Status"] == "Done": # we only want to do this for "Done" Issues
continue
finish_sprint = False
for header in sprints: # go through all the sprint cells for this row and get the last not empty one.
if not pd.isnull(ols_df.loc[i, header]):
finish_sprint = ols_df.loc[i, header]
if finish_sprint:
ols_df.at[i,finish_sprint_col] = finish_sprint
# get number of issue finished per sprint.
dones = ols_df[(ols_df.Status == "Done") & (ols_df['Issue Type'] == "Story") ].pivot_table(index=["finish_sprint"],values=["Issue key"], aggfunc=[pd.Series.nunique])这可能是一种更简单的方法,但现在看来是可行的.
https://stackoverflow.com/questions/60107760
复制相似问题