来自JIRA的Python CSV，具有重复的Sprint头_Python_Python 3.x_Csv_Jira

来自JIRA的Python CSV，具有重复的Sprint头

python python-3.x csv jira

来自JIRA的Python CSV，具有重复的Sprint头,python,python-3.x,csv,jira,Python,Python 3.x,Csv,Jira,我正在使用python 3.7.5 我有一个CSV文件，我从Jira实例中得到它，以便查看哪个问题在哪个sprint中完成。 Jira跟踪问题所在的每个sprint，因此如果导出CSV，您将获得多个sprint标题，其中包含以下数据： Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19 OLS-871,Story

我正在使用python 3.7.5

我有一个CSV文件，我从Jira实例中得到它，以便查看哪个问题在哪个sprint中完成。 Jira跟踪问题所在的每个sprint，因此如果导出CSV，您将获得多个sprint标题，其中包含以下数据：

Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint
OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19
OLS-871,Story,Done,Sprint #18,Sprint #28,,
OLS-165,Story,Done,Sprint 1,Sprint 3,Sprint #18,Sprint #19
OLS-868,Story,Done,Sprint #28,,,

import csv
with open('../OLS-tix2.csv', newline='') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print(row['Sprint'])

Sprint #19

Sprint #19

我需要的是确定问题交付的Sprint，以便在Sprint列中找到最右边的部分，这样我就可以计算每个Sprint中实际完成了多少问题

我尝试过使用默认的python“csv”和DictReader，如下所示：

Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint
OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19
OLS-871,Story,Done,Sprint #18,Sprint #28,,
OLS-165,Story,Done,Sprint 1,Sprint 3,Sprint #18,Sprint #19
OLS-868,Story,Done,Sprint #28,,,

import csv
with open('../OLS-tix2.csv', newline='') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print(row['Sprint'])

Sprint #19

Sprint #19

但是，只有最后一列Sprint中没有任何内容时，才会得到空格。由于上面的输出如下所示：

Issue key,Issue Type,Status,Sprint,Sprint,Sprint,Sprint
OLS-526,Story,Done,Sprint #16,Sprint #17,Sprint #18,Sprint #19
OLS-871,Story,Done,Sprint #18,Sprint #28,,
OLS-165,Story,Done,Sprint 1,Sprint 3,Sprint #18,Sprint #19
OLS-868,Story,Done,Sprint #28,,,

import csv
with open('../OLS-tix2.csv', newline='') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print(row['Sprint'])

Sprint #19

Sprint #19

我可以只使用普通的csv阅读器，然后自己滚动，但我认为必须有一种更好的方式在python中实现这一点。

好的，所以我做了更多的研究，发现这可能是这项工作的一个好工具。有很多例子，作为奖励，我可以使用and和进行飞行/旋转

这就是我最终为自己工作的原因：

import pandas as pd


csv_file = "../OLS-tix.csv" # where the file is at
ols_df = pd.read_csv(csv_file)
finish_sprint_col = 'finish_sprint' # the column to put the actual Sprint thie issue was finished in
ols_df[finish_sprint_col] = "" # add the new blank column
sprints = ols_df.columns[ols_df.columns.str.contains('Sprint')] # get all the headers that contain the word sprint as they will be Sprint, Sprint.1 ... Sprint.N
for i,row in ols_df.iterrows():
  if not ols_df.at[i,"Status"] == "Done": # we only want to do this for "Done" Issues
    continue
  finish_sprint = False
  for header in sprints: # go through all the sprint cells for this row and get the last not empty one.
    if not pd.isnull(ols_df.loc[i, header]):
      finish_sprint = ols_df.loc[i, header]
  if finish_sprint:
    ols_df.at[i,finish_sprint_col] = finish_sprint

# get number of issue finished per sprint.
dones = ols_df[(ols_df.Status == "Done") & (ols_df['Issue Type'] == "Story") ].pivot_table(index=["finish_sprint"],values=["Issue key"], aggfunc=[pd.Series.nunique])

这可能是一种更简单的方法，但目前看来似乎很有效…

好的，所以我四处看了看，发现这可能是一个很好的工具。有很多例子，作为奖励，我可以使用and和进行飞行/旋转

这就是我最终为自己工作的原因：

import pandas as pd


csv_file = "../OLS-tix.csv" # where the file is at
ols_df = pd.read_csv(csv_file)
finish_sprint_col = 'finish_sprint' # the column to put the actual Sprint thie issue was finished in
ols_df[finish_sprint_col] = "" # add the new blank column
sprints = ols_df.columns[ols_df.columns.str.contains('Sprint')] # get all the headers that contain the word sprint as they will be Sprint, Sprint.1 ... Sprint.N
for i,row in ols_df.iterrows():
  if not ols_df.at[i,"Status"] == "Done": # we only want to do this for "Done" Issues
    continue
  finish_sprint = False
  for header in sprints: # go through all the sprint cells for this row and get the last not empty one.
    if not pd.isnull(ols_df.loc[i, header]):
      finish_sprint = ols_df.loc[i, header]
  if finish_sprint:
    ols_df.at[i,finish_sprint_col] = finish_sprint

# get number of issue finished per sprint.
dones = ols_df[(ols_df.Status == "Done") & (ols_df['Issue Type'] == "Story") ].pivot_table(index=["finish_sprint"],values=["Issue key"], aggfunc=[pd.Series.nunique])

这可能是一种更简单的方法，但目前看来似乎有效