Python 熊猫：如何删除由连接的数据帧产生的重复索引？_Python_Pandas

Python 熊猫：如何删除由连接的数据帧产生的重复索引？

python pandas

Python 熊猫：如何删除由连接的数据帧产生的重复索引？,python,pandas,Python,Pandas,我有一些从内部数据库检索数据的代码，然后将数据连接为数据帧。我遇到的问题是，数据帧中存在重复的索引。以下是完整的代码： from jira import JIRA import pandas as pd cert_path = 'C:\\cert.crt' start_date = '2020-10-01' end_date = '2020-10-31' # three different instances (each with their own schema) a_session =

我有一些从内部数据库检索数据的代码，然后将数据连接为数据帧。我遇到的问题是，数据帧中存在重复的索引。以下是完整的代码：

from jira import JIRA
import pandas as pd

cert_path = 'C:\\cert.crt'

start_date = '2020-10-01'
end_date = '2020-10-31'

# three different instances (each with their own schema)
a_session = JIRA(server='https://jira.myinstance-A.com', options={'verify': cert_path}, kerberos=True)

b_session = JIRA(server='https://jira.myinstance-B.com', options={'verify': cert_path}, kerberos=True)

c_session = JIRA(server='https://jira.myinstance-C.com', options={'verify': cert_path}, kerberos=True)


# define Jira queries
query_1 = 'project = \"Test Project 1\" and issuetype = Incident and resolution = Resolved and updated >= {} and updated <= {}'.format(start_date, end_date)

query_2 = 'project = \"Test Project 2\" and issuetype = Incident and resolution = Resolved and updated >= {} and updated <= {}'.format(start_date, end_date)

query_3 = 'project = \"Test Project 3\" and issuetype = Defect and resolution = Resolved and releasedate >= {} and releasedate <= {}'.format(start_date, end_date)

query_4 = 'project = \"Test Project 4\" and issuetype = Enhancement and resolution = Done and completed >= {} and completed <= {}'.format(start_date, end_date)


# fetch all issues from a given session for a given query
block_size = 100
block_num = 0


def get_all_issues(session, query):

    block_size = 50
    block_num = 0
    
    start = 0
    all_issues = []
    while True:
        issues = session.search_issues(query, start, block_size)
        if len(issues) == 0:
            # No more issues
            break
        start += len(issues)
        for issue in issues:
            all_issues.append(issue)

    issues = pd.DataFrame(issues)

    for issue in all_issues:
        d = {
            'jira_key' : issue.key,
            'issue_type' : issue.fields.type,
            'creator' : issue.fields.creator,
            'resolution' : issue.fields.resolution
             }

        fields = issue.fields  # For brevity
        if hasattr(fields, "custom_field_123"):
            d['system_change'] = fields.custom_field_123
        if hasattr(fields, "custom_field_456"):
            d['system_resources'] = fields.custom_field_456
        if hasattr(fields, "custom_field_789"):
            d['system_backup'] = fields.custom_field_789

        issues = issues.append(d, ignore_index=True)

    return issues


# list of queries, and the corresponding backend
queries = [
    (a_session, query_1),
    (a_session, query_2),
    (b_session, query_3),
    (c_session, query_4),
]


# loop over each pair of session and query, calling the get_all_issues function, and save the dataframe we get each time
dataframes = []

for session, query in queries:
    dataframe = get_all_issues(session, query)
    dataframes.append(dataframe)


# concatenate all data frames
df_concat = pd.concat(dataframes)

# derive the business units from the project codes
df_concat['business_unit'] = np.where(df_concat['jira_key'].str.contains('MER'), 'Mercedes',
                             np.where(df_concat['jira_key'].str.contains('HON'), 'Honda',
                             np.where(df_concat['jira_key'].str.contains('AST'), 'Aston Martin', '*ERROR*')))

“排列”索引以使其连续的最佳方法是什么？（详情如下）

提前谢谢

只需添加

忽略索引

参数：

df_concat = pd.concat(dataframes, ignore_index=True)

concat发生后，您还可以使用具有相同效果的

reset_index（）

。

concat接受关键字参数

ignore_index

，您可以将其设置为True，它将为您提供所需的结果

   jira key   issue_type    creator   resolution   system_change   system_resources   system_backup   business_unit
0  MER-361    Incident      Smith, J  Resolved     Saturn          High               NaN             Mercedes
1  MER-362    Enhancement   Jones, T  In Progress  NaN             Medium             Not Applicable  Mercedes
2  MER-363    Incident      Ng, V     Resolved     Saturn          NaN                Not Applicable  Mercedes
3  MER-364    Incident      Jones, T  Resolved     NaN             NaN                Not Applicable  Mercedes
4  AST-022    Incident      Smith, J  Resolved     Saturn          High               NaN             Astin Martin
5  AST-023    Incident      Smith, J  Resolved     Saturn          High               NaN             Astin Martin
6  AST-024    Incident      Jones, T  Resolved     Saturn          High               NaN             Astin Martin
7  HON-124    Incident      Smith, J  In Progress  NaN             Low                NaN             Honda
8  HON-125    Incident      Smith, J  Resolved     Saturn          High               NaN             Honda
9  HIN-126    Incident      Jones, T  In Progress  Saturn          Low                NaN             Honda

df_concat = pd.concat(dataframes, ignore_index=True)