Python 从.csv文件中的特定位置提取值_Python_Regex_Pandas

Python 从.csv文件中的特定位置提取值

python regex pandas

Python 从.csv文件中的特定位置提取值,python,regex,pandas,Python,Regex,Pandas,我有一个数据帧df，其中包含一列指向多个CSV的路径df['path']。csv如下所示： # Reaction: a + 94Mo Production of 94Ru Ground state # Beam current: 0.00250 mA Energy range: 40.000 --> 39.000 MeV # Irradiation time : 0 years 0 days 5 hours 0 minutes 0 seco

我有一个数据帧

df

，其中包含一列指向多个CSV的路径

df['path']

。csv如下所示：

# Reaction: a +  94Mo Production of  94Ru Ground state
# Beam current:      0.00250 mA Energy range:   40.000 -->   39.000 MeV
# Irradiation time     :      0 years   0 days  5 hours  0 minutes  0 seconds 
# Cooling time         :      0 years   0 days  0 hours  0 minutes  0 seconds 
# Half life            :      0 years   0 days  0 hours 51 minutes 48 seconds 
# Maximum production at:      0 years   0 days 20 hours 50 minutes 10 seconds 
# Initial production rate:  1.87357E-14 [s^-1] Decay rate:  2.23020E-04 [s^-1]
# # time points =100
# Time [h] Activity [GBq] #isotopes [   ]  Yield [GBq/mAh]  Isotopic frac.
     0.1    9.06448E-05    4.06442E+08    3.62579E-01        0.00355
     0.2    1.74297E-04    7.81528E+08    3.34607E-01        0.00347
     0.3    2.51495E-04    1.12768E+09    3.08792E-01        0.00339

我希望提取“半衰期”的价值。在everyfile中，这始终是.csv文件第五行冒号之后的数字

回答： 根据下面的答案，我构造了一个正则表达式来提取值：

for i, p in enumerate(df['path']):
    with open(p, 'r') as f:
            text = open(p)
            for line in text:
                if re.match('# Half life\s*:\s*([^\n]+)', line):
                    number = re.match('# Half life\s*:\s*([^\n]+)', line).group(1)

这种模式应该适合你

#半衰期*：\s*（[^\n]+）

它匹配字符串的开头：

#半衰期

然后，冒号的可变空格数：

\s*：

然后，另一个可变数量的空格：

\s*

然后，它捕获所有内容，直到出现一个新行字符：

（[^\n]+）

您可以访问捕获组1中的值。

这就是您要查找的吗

#半衰期\s*：\s*（[^\n]+）

？你试过什么吗？@dvo是的，这就是我要找的for@Allentro作为答案发布，并在不同部分进行解释。