来自JIRA的Python CSV,具有重复的Sprint头
我正在使用python 3.7.5 我有一个CSV文件,我从Jira实例中得到它,以便查看哪个问题在哪个sprint中完成。 Jira跟踪问题所在的每个sprint,因此如果导出CSV,您将获得多个sprint标题,其中包含以下数据:来自JIRA的Python CSV,具有重复的Sprint头,python,python-3.x,csv,jira,Python,Python 3.x,Csv,Jira,我正在使用python 3.7.5 我有一个CSV文件,我从Jira实例中得到它,以便查看哪个问题在哪个sprint中完成。 Jira跟踪问题所在的每个sprint,因此如果导出CSV,您将获得多个sprint标题,其中包含以下数据: Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19 OLS-871,Story
Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint
OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19
OLS-871,Story,Done,Sprint #18,Sprint #28,,
OLS-165,Story,Done,Sprint 1,Sprint 3,Sprint #18,Sprint #19
OLS-868,Story,Done,Sprint #28,,,
import csv
with open('../OLS-tix2.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['Sprint'])
Sprint #19
Sprint #19
我需要的是确定问题交付的Sprint,以便在Sprint列中找到最右边的部分,这样我就可以计算每个Sprint中实际完成了多少问题
我尝试过使用默认的python“csv”和DictReader,如下所示:
Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint
OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19
OLS-871,Story,Done,Sprint #18,Sprint #28,,
OLS-165,Story,Done,Sprint 1,Sprint 3,Sprint #18,Sprint #19
OLS-868,Story,Done,Sprint #28,,,
import csv
with open('../OLS-tix2.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['Sprint'])
Sprint #19
Sprint #19
但是,只有最后一列Sprint中没有任何内容时,才会得到空格。由于上面的输出如下所示:
Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint
OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19
OLS-871,Story,Done,Sprint #18,Sprint #28,,
OLS-165,Story,Done,Sprint 1,Sprint 3,Sprint #18,Sprint #19
OLS-868,Story,Done,Sprint #28,,,
import csv
with open('../OLS-tix2.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['Sprint'])
Sprint #19
Sprint #19
我可以只使用普通的csv阅读器,然后自己滚动,但我认为必须有一种更好的方式在python中实现这一点。好的,所以我做了更多的研究,发现这可能是这项工作的一个好工具。 有很多例子,作为奖励,我可以使用and和进行飞行/旋转 这就是我最终为自己工作的原因:
import pandas as pd
csv_file = "../OLS-tix.csv" # where the file is at
ols_df = pd.read_csv(csv_file)
finish_sprint_col = 'finish_sprint' # the column to put the actual Sprint thie issue was finished in
ols_df[finish_sprint_col] = "" # add the new blank column
sprints = ols_df.columns[ols_df.columns.str.contains('Sprint')] # get all the headers that contain the word sprint as they will be Sprint, Sprint.1 ... Sprint.N
for i,row in ols_df.iterrows():
if not ols_df.at[i,"Status"] == "Done": # we only want to do this for "Done" Issues
continue
finish_sprint = False
for header in sprints: # go through all the sprint cells for this row and get the last not empty one.
if not pd.isnull(ols_df.loc[i, header]):
finish_sprint = ols_df.loc[i, header]
if finish_sprint:
ols_df.at[i,finish_sprint_col] = finish_sprint
# get number of issue finished per sprint.
dones = ols_df[(ols_df.Status == "Done") & (ols_df['Issue Type'] == "Story") ].pivot_table(index=["finish_sprint"],values=["Issue key"], aggfunc=[pd.Series.nunique])
这可能是一种更简单的方法,但目前看来似乎很有效…好的,所以我四处看了看,发现这可能是一个很好的工具。 有很多例子,作为奖励,我可以使用and和进行飞行/旋转 这就是我最终为自己工作的原因:
import pandas as pd
csv_file = "../OLS-tix.csv" # where the file is at
ols_df = pd.read_csv(csv_file)
finish_sprint_col = 'finish_sprint' # the column to put the actual Sprint thie issue was finished in
ols_df[finish_sprint_col] = "" # add the new blank column
sprints = ols_df.columns[ols_df.columns.str.contains('Sprint')] # get all the headers that contain the word sprint as they will be Sprint, Sprint.1 ... Sprint.N
for i,row in ols_df.iterrows():
if not ols_df.at[i,"Status"] == "Done": # we only want to do this for "Done" Issues
continue
finish_sprint = False
for header in sprints: # go through all the sprint cells for this row and get the last not empty one.
if not pd.isnull(ols_df.loc[i, header]):
finish_sprint = ols_df.loc[i, header]
if finish_sprint:
ols_df.at[i,finish_sprint_col] = finish_sprint
# get number of issue finished per sprint.
dones = ols_df[(ols_df.Status == "Done") & (ols_df['Issue Type'] == "Story") ].pivot_table(index=["finish_sprint"],values=["Issue key"], aggfunc=[pd.Series.nunique])
这可能是一种更简单的方法,但目前看来似乎有效