如何使用Python解析相对有组织但不分隔的文本?

如何使用Python解析相对有组织但不分隔的文本?,python,parsing,text,Python,Parsing,Text,我试图从一个文本文件中提取数据,该文件的格式如图所示。它包括手术列表,我需要从每个病例中获得的信息是:患者姓名、开始时间时间1、结束时间2、手术类型和外科医生姓名。这是原始文本。显然,患者和外科医生的名字被实名所取代: Run on: 10/07/19 - 1444 Hospital

我试图从一个文本文件中提取数据,该文件的格式如图所示。它包括手术列表,我需要从每个病例中获得的信息是:患者姓名、开始时间时间1、结束时间2、手术类型和外科医生姓名。这是原始文本。显然,患者和外科医生的名字被实名所取代:

Run on: 10/07/19 - 1444                                                       Hospital                                                        PAGE 1

Run by: H                                                          Final Slate For: 11/07/19 THU                                                   

PIR        Patient Name                     R/L/B   Proposed Procedure                                          Surgeon                            Path Reg'd      Dur
POR Time   Unit Number   PHN                                                                                    Assist                             Bld Req'd     PIR-POR
Pri        DOB           Age/S                                                                                                                     Med Imaging
Loc        Bed Type                                                                                                                                Req'd Staff
Ward


OR Room - 1                                           Room End Time: 1730          Anaesthetist: S,A T                                            
OHS 0900-2000                                               
0745       patient 1                             Replace Root and Ascending                                              surgeon1   GENERAL                
1305       RC02654289   96985693                        Aorta/Hemiarch (Tissue), Amputate Left                                                   4 UNITS                
3A         21/12/1943     75/M                            Atrial Appendage                                                                         Perfusionist           
SDA        ICU                                                                                                                                                            
RC-T2S    
 Weeks on Waitlist:  5   (36 days)                                                                                                                                  320
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1400       patient2                           Coronary Artery Bypass Graft                                            surgeon2   GENERAL                
1730       RC00968458   906854959                                                                                                                 SCREEN                 
2B         18/06/1958     61/M                                                                                                                     Perfusionist           
INPT       ICU                                                                                                                                                            
RC-T2S    
 Weeks on Waitlist:  2   (17 days)                                                                                                                                  210
                                                  Other Comments:   DM Type 2                                                                      

Run on: 10/07/19 - 1444                                                      Hospital                                                        PAGE 2

Run by: H                                                         Final Slate For: 11/07/19 THU                                                   

PIR        Patient Name                     R/L/B   Proposed Procedure                                          Surgeon                            Path Reg'd      Dur
POR Time   Unit Number   PHN                                                                                    Assist                             Bld Req'd     PIR-POR
Pri        DOB           Age/S                                                                                                                     Med Imaging
Loc        Bed Type                                                                                                                                Req'd Staff
Ward


OR Room - 2                                           Room End Time: 1825          Anaesthetist: K,N S                                             
OHS 0900-1930                                               
0745       Patient3                          Aortic Valve Replacement (Mechanical)                                   Surgeon3   GENERAL                
1205       RC00584564   9095681571                                                                                                                 4 UNITS                
3A         13/04/1955     64/F                                                                                                                     Perfusionist           
SDA        ICU                                                                                                                                                            
RC-T2S    
 Weeks on Waitlist: 14   (98 days)                                                                                                                                  260
                                                  Other Comments:   DM Type 2                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
我需要这样的输出:

patinet1 | time1 | time2 | procedure1 | surgeon1
patinet2 | time1 | time2 | procedure2 | surgeon2
.
.
.
我已经检查了代码并修复了它

这应该能奏效

进口稀土 读取输入文件内容 打开“input.txt”作为输入文件: inputText=inputFile.read regx=r'^\d{4}\s{2,}\d+??=\s{2,}\s{2,}\d+??=\s{2,}\s{2,}\d+??=\s{2,}\d} parsedText=re.findallregx,inputText,flags=re.M 行=[] 组织要写入文件的数据 对于parsedText中的行: 如果lenline[0]: rows.appendlistline 其他: 行[-1][-1]=行[-1] 写入文件 打开'output.txt','w'作为csvfile: 对于行中的行: csvfile.write{}{}{}{}{}{}}{}\n.formatrow[1],row[0],row[4],row[2],row[3] 你可以查找我在这里用来解释的正则表达式,

样本输入:

Run on: 10/07/19 - 1444                                                       Hospital                                                        PAGE 1

Run by: H                                                          Final Slate For: 11/07/19 THU                                                   

PIR        Patient Name                     R/L/B   Proposed Procedure                                          Surgeon                            Path Reg'd      Dur
POR Time   Unit Number   PHN                                                                                    Assist                             Bld Req'd     PIR-POR
Pri        DOB           Age/S                                                                                                                     Med Imaging
Loc        Bed Type                                                                                                                                Req'd Staff
Ward


OR Room - 1                                           Room End Time: 1730          Anaesthetist: S,A T                                            
OHS 0900-2000                                               
0745       Morgan Freeman                             Replace Root and Ascending                                              Dr. Henry Cavail   GENERAL                
1305       RC02654289   96985693                        Aorta/Hemiarch (Tissue), Amputate Left                                                   4 UNITS                
3A         21/12/1943     75/M                            Atrial Appendage                                                                         Perfusionist           
SDA        ICU                                                                                                                                                            
RC-T2S    
 Weeks on Waitlist:  5   (36 days)                                                                                                                                  320
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1400       Alicia Cuthbart                           Coronary Artery Bypass Graft                                            Dr. Denzel Washington   GENERAL                
1730       RC00968458   906854959                                                                                                                 SCREEN                 
2B         18/06/1958     61/M                                                                                                                     Perfusionist           
INPT       ICU                                                                                                                                                            
RC-T2S    
 Weeks on Waitlist:  2   (17 days)                                                                                                                                  210
                                                  Other Comments:   DM Type 2                                                                      

Run on: 10/07/19 - 1444                                                      Hospital                                                        PAGE 2

Run by: H                                                         Final Slate For: 11/07/19 THU                                                   

PIR        Patient Name                     R/L/B   Proposed Procedure                                          Surgeon                            Path Reg'd      Dur
POR Time   Unit Number   PHN                                                                                    Assist                             Bld Req'd     PIR-POR
Pri        DOB           Age/S                                                                                                                     Med Imaging
Loc        Bed Type                                                                                                                                Req'd Staff
Ward


OR Room - 2                                           Room End Time: 1825          Anaesthetist: K,N S                                             
OHS 0900-1930                                               
0745       John van-Damn                          Aortic Valve Replacement (Mechanical)                                   Dr. Bon Jovi   GENERAL                
1205       RC00584564   9095681571                                                                                                                 4 UNITS                
3A         13/04/1955     64/F                                                                                                                     Perfusionist           
SDA        ICU                                                                                                                                                            
RC-T2S    
 Weeks on Waitlist: 14   (98 days)                                                                                                                                  260
                                                  Other Comments:   DM Type 2                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

样本输出:

Morgan Freeman | 0745 | 1305 | Replace Root and Ascending | Dr. Henry Cavail
Alicia Cuthbart | 1400 | 1730 | Coronary Artery Bypass Graft | Dr. Denzel Washington
John van-Damn | 0745 | 1205 | Aortic Valve Replacement (Mechanical) | Dr. Bon Jovi


在你的问题中,你能把原始数据上传成文本吗?最好是包含文本,而不是图片,这样人们就不必自己伪造数据了。此外,它成为问题的一个永久部分,在网站上可见。一般来说,阅读会有很大的帮助。按行检查文本你可以很容易地找到你需要的行。似乎每个文本都以类似的char列开始,因此应该更容易找到正确的文本来源:不要发布代码、数据、错误消息等的图像-将文本复制或键入问题中。请保留图像用于图表或演示渲染错误,这些错误无法通过文本准确描述。@NFR非常感谢您的指导性评论。这是我在这里的第一个问题,我不确定投注的方式是什么。现在我添加了一段文本文件。
Morgan Freeman | 0745 | 1305 | Replace Root and Ascending | Dr. Henry Cavail
Alicia Cuthbart | 1400 | 1730 | Coronary Artery Bypass Graft | Dr. Denzel Washington
John van-Damn | 0745 | 1205 | Aortic Valve Replacement (Mechanical) | Dr. Bon Jovi