Regex 正则表达式查找发生在月份名称之前的数字(熊猫)
如果有数字和月份名称,我试图在熊猫列中提取月份名称前的数字。列中的字符串如下所示:Regex 正则表达式查找发生在月份名称之前的数字(熊猫),regex,pandas,Regex,Pandas,如果有数字和月份名称,我试图在熊猫列中提取月份名称前的数字。列中的字符串如下所示: 133 h missed intake office visit on 28 June 1994 a... 136 11 February 1985 CPT Code: 90801 - Psychiatric... 150 12 March 1980 SOS-10 Total Score:\n 151 22 June 1990
133 h missed intake office visit on 28 June 1994 a...
136 11 February 1985 CPT Code: 90801 - Psychiatric...
150 12 March 1980 SOS-10 Total Score:\n
151 22 June 1990 Medical History:\n
165 .On 18 August 1975 patient presented to BH ED/...
181 18 August 1995 Primary Care Doctor:\n
182 eby 13 June 1974 it appears amitriptyline had ...
188 12 March 2004 CPT Code: 90801 - Psychiatric Di...
228 s 20 yo M carries dx of BPAD, presents for psy...
229 t Allergies Sulfa (Sulfonamide Antibiotics) - ...
230 B/R Walnut Ridge. Raised with sister and paren...
231 50 yo DWF with a history of alcohol use disord...
232 )HTN, hypercholesterolemia, DM, sleep apnea,, ...
例如,在第133行中,我希望从单词June之前得到28,在第136行中,从单词二月之前得到11
我正在尝试调整一个以前使用的正则表达式来获取数字。这个正则表达式是:
DF["col2"] = DF["col1"].str.extract(r'\b\d{1,2}\s(January|February|March|April|May|June|July)|August|September|October|November|December')
我猜捕获括号导致提取的对象是月份而不是数字,但是当我尝试将捕获括号放在数字\d{1,2}
上时,我收到一个错误
如何从该列中仅获取数字?要仅提取月份名称前的天数数字部分,您可以使用
r'\b(\d{1,2})\s(?:January|February|March|April|May|June|July|August|September|October|November|December)
^ ^
看
捕获括号仅在\d{1,2}
模式部分周围。月份名称位于不创建单独捕获的非捕获组((?:…)
)内
还要注意的是,月份名称都放在一个分组结构中,在原始正则表达式中,结尾
)
放在July
之后,破坏了正则表达式。站在巨人的肩膀上。非常感谢。