Python 如何从pandas中所有列的字符串中提取数字并取数字的中间值?

Python 如何从pandas中所有列的字符串中提取数字并取数字的中间值?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框: name,a,b,c,d,e,f,g,h,i,j "Female, n (%)",1991 (38.26%),1018 (41.52%),438 (35.12%),771 (35.16%),244 (35.72%),343 (32.48%),316 (40.51%),177 (33.84%),133 (41.18%),792 (35.92%) "Male, n (%)",3190 (61.30%),1426 (58.16%),803 (6

我有一个数据框:

name,a,b,c,d,e,f,g,h,i,j
"Female, n (%)",1991 (38.26%),1018 (41.52%),438 (35.12%),771 (35.16%),244 (35.72%),343 (32.48%),316 (40.51%),177 (33.84%),133 (41.18%),792 (35.92%)
"Male, n (%)",3190 (61.30%),1426 (58.16%),803 (64.39%),1415 (64.52%),436 (63.84%),711 (67.33%),463 (59.36%),345 (65.97%),187 (57.89%),1403 (63.63%)
"Age, years",44.00 [38.00 - 50.00],43.00 [37.00 - 49.00],43.00 [37.00 - 49.00],44.00 [38.00 - 50.00],44.00 [39.00 - 50.00],44.00 [38.00 - 50.00],43.00 [37.00 - 49.00],45.00 [39.00 - 51.00],44.00 [37.00 - 50.00],45.00 [38.00 - 51.00]
我想做的是取这些值的中值,但要符合以下标准:

  • 如果该行包含
    (\d%)
    ,则我要提取该值
  • 如果行由
    [\d-\d]
    组成,那么我想提取方括号前的数字
  • 需要注意的是,每一行都有相同类型的数据

    预期结果:

    方法1 我们可以
    替换
    数据框中的额外字符,使每行仅包含括号中的数值或方括号前的数值,然后将提取值的数据类型更改为
    浮点
    ,并沿列轴取
    中值

    d = {r'.*?\((.*)%\)': r'\1', r'^(\S+)\s\[.*': r'\1'}
    df['median'] = df.set_index('name').replace(d, regex=True).astype(float).median(axis=1).values
    
    在线查看

    方法2
    stack
    要重塑的数据帧,然后
    从堆叠帧中提取数值,然后将提取值的
    dtype
    更改为
    float
    并在
    级别=0上计算
    中值

    df['median'] = df.set_index('name').stack()\
                     .str.extract(r'((?<=\()\S+(?=%\))|^\S+(?=\s\[))', expand=False)\
                     .astype(float).median(level=0).values
    
                name                      a                      b                      c                      d                      e                      f                      g                      h                      i                      j  median
    0  Female, n (%)          1991 (38.26%)          1018 (41.52%)           438 (35.12%)           771 (35.16%)           244 (35.72%)           343 (32.48%)           316 (40.51%)           177 (33.84%)           133 (41.18%)           792 (35.92%)  35.820
    1    Male, n (%)          3190 (61.30%)          1426 (58.16%)           803 (64.39%)          1415 (64.52%)           436 (63.84%)           711 (67.33%)           463 (59.36%)           345 (65.97%)           187 (57.89%)          1403 (63.63%)  63.735
    2     Age, years  44.00 [38.00 - 50.00]  43.00 [37.00 - 49.00]  43.00 [37.00 - 49.00]  44.00 [38.00 - 50.00]  44.00 [39.00 - 50.00]  44.00 [38.00 - 50.00]  43.00 [37.00 - 49.00]  45.00 [39.00 - 51.00]  44.00 [37.00 - 50.00]  45.00 [38.00 - 51.00]  44.000