在Python中提取数据和转置

在Python中提取数据和转置,python,csv,extract,delimiter,tab-delimited,Python,Csv,Extract,Delimiter,Tab Delimited,我有一个文本文件,从中提取了两个字符串之间的区域。提取的区域如下所示: title "A" "B" "C" "D" "E" "F" number "G1" "G2" "G3" "G4" "G5" "G6" data "aaa,bbb" "sss,ddd" "fff,ggg" "rrr,eee" "aaa,ooo" "ggg,aaa" title "A" "B" "C" "D" "E" "F" number "G1"

我有一个文本文件,从中提取了两个字符串之间的区域。提取的区域如下所示:

title   "A" "B" "C" "D" "E" "F" 
number  "G1"    "G2"    "G3"    "G4"    "G5"    "G6"
data "aaa,bbb"  "sss,ddd"   "fff,ggg"   "rrr,eee"   "aaa,ooo"   "ggg,aaa"
title   
"A" 
"B" 
"C" 
"D" 
"E" 
"F" 
number  
"G1"    
"G2"    
"G3"    
"G4"    
"G5"    
"G6"
data 
"aaa    bbb"    
"sss    ddd"    
"fff    ggg"    
"rrr    eee"    
"aaa    ooo"    
"ggg    aaa"
我想写一个csv文件。但即使将\t指定为分隔符,它也会将逗号附近的单元格拆分为一行中的单独单元格和制表符,以将数据转换为新行,如下所示:

title   "A" "B" "C" "D" "E" "F" 
number  "G1"    "G2"    "G3"    "G4"    "G5"    "G6"
data "aaa,bbb"  "sss,ddd"   "fff,ggg"   "rrr,eee"   "aaa,ooo"   "ggg,aaa"
title   
"A" 
"B" 
"C" 
"D" 
"E" 
"F" 
number  
"G1"    
"G2"    
"G3"    
"G4"    
"G5"    
"G6"
data 
"aaa    bbb"    
"sss    ddd"    
"fff    ggg"    
"rrr    eee"    
"aaa    ooo"    
"ggg    aaa"
我需要这样:

title   A   B   C   D   E   F   
number  G1  G2  G3  G4  G5  G6
data    aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa
在一行中的单独单元格中,由制表符分隔。非常感谢您的帮助。

infle.csv:

title   "A" "B" "C" "D" "E" "F" 
number  "G1"    "G2"    "G3"    "G4"    "G5"    "G6"
data    "aaa,bbb"   "sss,ddd"   "fff,ggg"   "rrr,eee"   "aaa,ooo"   "ggg,aaa"
outfile.csv:

title   A   B   C   D   E   F   
number  G1  G2  G3  G4  G5  G6
data    aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa
代码:

使用正则表达式

f=open('yoyr_file.txt','r')
f=f.readlines()
for x in f:
    print " ".join(re.findall('\w+,?\w*',x))
输出:

'title A B C D E F'
'number G1 G2 G3 G4 G5 G6'
'data aaa,bbb sss,ddd fff,ggg rrr,eee aaa,ooo ggg,aaa'

readlines将以行列表的形式读取您的文件,然后我将在其上循环查找模式。获取图案后,可以根据需要对其进行格式化。

提取的区域看起来像@inspectorG4dget,它当前位于一个文件中。我用if line.startswith!示例_title:copy=True outfile.writeline写入文件。