如何对带有数字和字母的字符串进行排序,以便在python中进行分类?

如何对带有数字和字母的字符串进行排序,以便在python中进行分类?,python,dictionary,for-loop,Python,Dictionary,For Loop,我正在处理一个数据集,我有一些分类在ICD10中的诊断。然而,由于我有很多不同的代码,我想把它们分类成更大的类别。所以我在互联网上找到了这些分类。问题是代码类似于“A04”或“Z01”,我无法对它们进行排序,因为它们是字母和数字的混合体。我尝试了下面的代码,但我知道变量“diag_icd10_ranges”不正确。有人能帮我吗 df['code_diag_assoc_icd10'] = df['Assoc_Diagnose'] # Associated category names diag_

我正在处理一个数据集,我有一些分类在ICD10中的诊断。然而,由于我有很多不同的代码,我想把它们分类成更大的类别。所以我在互联网上找到了这些分类。问题是代码类似于“A04”或“Z01”,我无法对它们进行排序,因为它们是字母和数字的混合体。我尝试了下面的代码,但我知道变量“diag_icd10_ranges”不正确。有人能帮我吗

df['code_diag_assoc_icd10'] = df['Assoc_Diagnose']

# Associated category names
diag_icd10_ranges = [(A00, B99), (C00, D49), (D50, D89), (E00, E89), (F01, F99), (G00, G99), 
       (H00, H59), (H60, H95), (I00, I99), (J00, J99), (K00, K95), (L00, L99),
       (M00, M99), (N00, N99), (O00, O9A), (P00, P96), (Q00, Q99), (R00, R99),
       (S00, T88), (V00, Y99), (Z00, Z99)]

diag_icd10_dict = {0: 'infectious_icd10d', 1: 'neoplasms_icd10d', 2: 'blood_icd10d', 3: 'endocrine_icd10d',
           4: 'mental_icd10d', 5: 'nervous_icd10d', 6: 'eye_icd10d', 7: 'ear_icd10d',
           8: 'circulatory_icd10d', 9: 'respiratory_icd10d', 10: 'digestive_icd10d', 11: 'skin_icd10d', 
          12: 'musculo_icd10d', 13: 'genitourinary_icd10d', 14: 'pregnancy_icd10d', 15: 'perinatalperiod_icd10d', 
          16: 'congenital_icd10d',
          17: 'abnormalfindings_icd10d', 18:'injury_icd10d', 19:'morbidity', 20:'healthstatus'}

# Re-code in terms of integer
for num, cat_range in enumerate(diag_icd10_ranges):
df['code_diag_assoc_icd10'] = np.where(df['code_diag_assoc_icd10'].between(cat_range[0],cat_range[1]), 
                                   num, df['code_diag_assoc_icd10'])

# Convert integer to category name using diag_dict
df['cat_diag_assoc_icd10'] = df['code_diag_assoc_icd10'].replace(proc_icd10_dict)

你应该能够在两者之间使用蟒蛇式的方法。请参阅下面的代码

In [21]: diag_icd10_ranges = [{ 1 : ('A00', 'B99') }, 
    ...:                      { 2 : ('C00', 'D49') }, 
    ...:                      { 3 : ('D50', 'D89') }, 
    ...:                      { 4 : ('E00', 'E89') }, 
    ...:                      { 5 : ('F01', 'F99') }, 
    ...:                      { 6 : ('G00', 'G99') }, 
    ...:                      { 7 : ('H00', 'H59') }, 
    ...:                      { 8 : ('H60', 'H95') }, 
    ...:                      { 9 : ('I00', 'I99') }, 
    ...:                      { 10: ('J00', 'J99') }, 
    ...:                      { 11: ('K00', 'K95') }, 
    ...:                      { 12: ('L00', 'L99') }, 
    ...:                      { 13: ('M00', 'M99') }, 
    ...:                      { 14: ('N00', 'N99') }, 
    ...:                      { 15: ('O00', 'O9A') }, 
    ...:                      { 16: ('P00', 'P96') }, 
    ...:                      { 17: ('Q00', 'Q99') }, 
    ...:                      { 18: ('R00', 'R99') }, 
    ...:                      { 19: ('S00', 'T88') }, 
    ...:                      { 20: ('V00', 'Y99') }, 
    ...:                      { 21: ('Z00', 'Z99') }
    ...:                     ]
    ...: 
    ...: heart_failure_icd10_code = 'I50.9'
    ...: 
    ...: chapter_number = [key for rec in diag_icd10_ranges for key, value in rec.items() if value[0] <= heart_failure_icd10_code <= value[1] ]

In [22]: print(chapter_number)
[9]

In [23]: 
[21]中的
diag_icd10_ranges=[{1:('A00','B99'),
…:{2:('C00','D49')},
…:{3:('D50','D89')},
…:{4:('E00','E89')},
…:{5:('F01','F99')},
…:{6:('G00','G99')},
…:{7:('H00','H59')},
…:{8:('H60','H95')},
…:{9:('I00','I99')},
…:{10:('J00','J99')},
…:{11:('K00','K95')},
…:{12:('L00','L99')},
…:{13:('M00','M99')},
…:{14:('N00','N99')},
…:{15:('O00','O9A')},
…:{16:('P00','P96')},
…:{17:('Q00','Q99')},
…:{18:('R00','R99')},
…:{19:('S00','T88')},
…:{20:('V00','Y99')},
…:{21:('Z00','Z99')}
...:                     ]
...: 
…:心力衰竭\u icd10\u代码='I50.9'
...: 

…:chapter_number=[key for rec in diag_icd10_key的范围,value in rec.items()如果值[0]您可以使用对分左,范围仅使用其下限表示:

from bisect import bisect_left

ranges = ["C00","D50","E00","F00","G00","H00","H60","I00",
          "J00","K00","L00","M00","N00","O00","P00","Q00","P00",
          "Q00","R00","S00","V00","Z00"]

def icdGroup(code): return bisect_left(ranges,code)

icdGroup("B20") # 0
icdGroup("H65") # 7
从空白到=Z00将在索引22处


对分_left将为您提供O(log(22))性能,因此,如果您有大量代码进行分类,这将比顺序搜索更有效。

您能给出一个您想要的排序示例吗?