如何使用Python解析相对有组织但不分隔的文本?
我试图从一个文本文件中提取数据,该文件的格式如图所示。它包括手术列表,我需要从每个病例中获得的信息是:患者姓名、开始时间时间1、结束时间2、手术类型和外科医生姓名。这是原始文本。显然,患者和外科医生的名字被实名所取代:如何使用Python解析相对有组织但不分隔的文本?,python,parsing,text,Python,Parsing,Text,我试图从一个文本文件中提取数据,该文件的格式如图所示。它包括手术列表,我需要从每个病例中获得的信息是:患者姓名、开始时间时间1、结束时间2、手术类型和外科医生姓名。这是原始文本。显然,患者和外科医生的名字被实名所取代: Run on: 10/07/19 - 1444 Hospital
Run on: 10/07/19 - 1444 Hospital PAGE 1
Run by: H Final Slate For: 11/07/19 THU
PIR Patient Name R/L/B Proposed Procedure Surgeon Path Reg'd Dur
POR Time Unit Number PHN Assist Bld Req'd PIR-POR
Pri DOB Age/S Med Imaging
Loc Bed Type Req'd Staff
Ward
OR Room - 1 Room End Time: 1730 Anaesthetist: S,A T
OHS 0900-2000
0745 patient 1 Replace Root and Ascending surgeon1 GENERAL
1305 RC02654289 96985693 Aorta/Hemiarch (Tissue), Amputate Left 4 UNITS
3A 21/12/1943 75/M Atrial Appendage Perfusionist
SDA ICU
RC-T2S
Weeks on Waitlist: 5 (36 days) 320
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1400 patient2 Coronary Artery Bypass Graft surgeon2 GENERAL
1730 RC00968458 906854959 SCREEN
2B 18/06/1958 61/M Perfusionist
INPT ICU
RC-T2S
Weeks on Waitlist: 2 (17 days) 210
Other Comments: DM Type 2
Run on: 10/07/19 - 1444 Hospital PAGE 2
Run by: H Final Slate For: 11/07/19 THU
PIR Patient Name R/L/B Proposed Procedure Surgeon Path Reg'd Dur
POR Time Unit Number PHN Assist Bld Req'd PIR-POR
Pri DOB Age/S Med Imaging
Loc Bed Type Req'd Staff
Ward
OR Room - 2 Room End Time: 1825 Anaesthetist: K,N S
OHS 0900-1930
0745 Patient3 Aortic Valve Replacement (Mechanical) Surgeon3 GENERAL
1205 RC00584564 9095681571 4 UNITS
3A 13/04/1955 64/F Perfusionist
SDA ICU
RC-T2S
Weeks on Waitlist: 14 (98 days) 260
Other Comments: DM Type 2
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
我需要这样的输出:
patinet1 | time1 | time2 | procedure1 | surgeon1
patinet2 | time1 | time2 | procedure2 | surgeon2
.
.
.
我已经检查了代码并修复了它
这应该能奏效
进口稀土
读取输入文件内容
打开“input.txt”作为输入文件:
inputText=inputFile.read
regx=r'^\d{4}\s{2,}\d+??=\s{2,}\s{2,}\d+??=\s{2,}\s{2,}\d+??=\s{2,}\d}
parsedText=re.findallregx,inputText,flags=re.M
行=[]
组织要写入文件的数据
对于parsedText中的行:
如果lenline[0]:
rows.appendlistline
其他:
行[-1][-1]=行[-1]
写入文件
打开'output.txt','w'作为csvfile:
对于行中的行:
csvfile.write{}{}{}{}{}{}}{}\n.formatrow[1],row[0],row[4],row[2],row[3]
你可以查找我在这里用来解释的正则表达式,
样本输入:
Run on: 10/07/19 - 1444 Hospital PAGE 1
Run by: H Final Slate For: 11/07/19 THU
PIR Patient Name R/L/B Proposed Procedure Surgeon Path Reg'd Dur
POR Time Unit Number PHN Assist Bld Req'd PIR-POR
Pri DOB Age/S Med Imaging
Loc Bed Type Req'd Staff
Ward
OR Room - 1 Room End Time: 1730 Anaesthetist: S,A T
OHS 0900-2000
0745 Morgan Freeman Replace Root and Ascending Dr. Henry Cavail GENERAL
1305 RC02654289 96985693 Aorta/Hemiarch (Tissue), Amputate Left 4 UNITS
3A 21/12/1943 75/M Atrial Appendage Perfusionist
SDA ICU
RC-T2S
Weeks on Waitlist: 5 (36 days) 320
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1400 Alicia Cuthbart Coronary Artery Bypass Graft Dr. Denzel Washington GENERAL
1730 RC00968458 906854959 SCREEN
2B 18/06/1958 61/M Perfusionist
INPT ICU
RC-T2S
Weeks on Waitlist: 2 (17 days) 210
Other Comments: DM Type 2
Run on: 10/07/19 - 1444 Hospital PAGE 2
Run by: H Final Slate For: 11/07/19 THU
PIR Patient Name R/L/B Proposed Procedure Surgeon Path Reg'd Dur
POR Time Unit Number PHN Assist Bld Req'd PIR-POR
Pri DOB Age/S Med Imaging
Loc Bed Type Req'd Staff
Ward
OR Room - 2 Room End Time: 1825 Anaesthetist: K,N S
OHS 0900-1930
0745 John van-Damn Aortic Valve Replacement (Mechanical) Dr. Bon Jovi GENERAL
1205 RC00584564 9095681571 4 UNITS
3A 13/04/1955 64/F Perfusionist
SDA ICU
RC-T2S
Weeks on Waitlist: 14 (98 days) 260
Other Comments: DM Type 2
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
样本输出:
Morgan Freeman | 0745 | 1305 | Replace Root and Ascending | Dr. Henry Cavail
Alicia Cuthbart | 1400 | 1730 | Coronary Artery Bypass Graft | Dr. Denzel Washington
John van-Damn | 0745 | 1205 | Aortic Valve Replacement (Mechanical) | Dr. Bon Jovi
在你的问题中,你能把原始数据上传成文本吗?最好是包含文本,而不是图片,这样人们就不必自己伪造数据了。此外,它成为问题的一个永久部分,在网站上可见。一般来说,阅读会有很大的帮助。按行检查文本你可以很容易地找到你需要的行。似乎每个文本都以类似的char列开始,因此应该更容易找到正确的文本来源:不要发布代码、数据、错误消息等的图像-将文本复制或键入问题中。请保留图像用于图表或演示渲染错误,这些错误无法通过文本准确描述。@NFR非常感谢您的指导性评论。这是我在这里的第一个问题,我不确定投注的方式是什么。现在我添加了一段文本文件。
Morgan Freeman | 0745 | 1305 | Replace Root and Ascending | Dr. Henry Cavail
Alicia Cuthbart | 1400 | 1730 | Coronary Artery Bypass Graft | Dr. Denzel Washington
John van-Damn | 0745 | 1205 | Aortic Valve Replacement (Mechanical) | Dr. Bon Jovi