Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中将无组织数据转换为可解析格式_Python_Python 2.7 - Fatal编程技术网

在python中将无组织数据转换为可解析格式

在python中将无组织数据转换为可解析格式,python,python-2.7,Python,Python 2.7,我有这个代码,它有这个代码的每一个元素都有一个意义 PRICING OPTION 11 TOTAL AMOUNT 40009 INR ADT TAX INCLUDED 1 UK 933 K 15FEB DEL BOM 1515 1725 TH 320 SRCI0 2 NH 830 S 15FEB BOM NRT 200

我有这个代码,它有这个代码的每一个元素都有一个意义

 PRICING OPTION 11                 TOTAL AMOUNT             40009 INR
ADT                               TAX INCLUDED 
1   UK    933  K  15FEB DEL BOM   1515  1725    TH   320       SRCI0
2   NH    830  S  15FEB BOM NRT   2000  0715 +  TH   788       SRCI0
3   NH    829  V  19FEB NRT BOM   1115  1825    MO   788       VRCI0
4   UK    988  K  19FEB BOM DEL   2045  2300    MO   320       VRCI0
´BOOKª          +TQ                                                      D  R  +8

 PRICING OPTION 12                 TOTAL AMOUNT             40376 INR
ADT                               TAX INCLUDED 
1   NH @ 6431  S  15FEB DEL BOM   1500  1715    TH   73H       SRCI0
2   NH    830  S  15FEB BOM NRT   2000  0715 +  TH   788       SRCI0
3   NH    827  W  19FEB NRT DEL   1715  0005 +  MO   788       WRCI0
´BOOKª          +TQ 
我尝试使用python提取每一行,并为每一行分割空格。问题是同一元素的不同行中的空格数可能不同


除了查找空格外,还有更好的方法从代码中提取元素吗

您可以使用正则表达式:

import re
final_data = [list(filter(lambda x:x, re.split('\s+', i))) for i in data.split('\n')][1:-1]
输出:

[['PRICING', 'OPTION', '11', 'TOTAL', 'AMOUNT', '40009', 'INR'], ['ADT', 'TAX', 'INCLUDED'], ['1', 'UK', '933', 'K', '15FEB', 'DEL', 'BOM', '1515', '1725', 'TH', '320', 'SRCI0'], ['2', 'NH', '830', 'S', '15FEB', 'BOM', 'NRT', '2000', '0715', '+', 'TH', '788', 'SRCI0'], ['3', 'NH', '829', 'V', '19FEB', 'NRT', 'BOM', '1115', '1825', 'MO', '788', 'VRCI0'], ['4', 'UK', '988', 'K', '19FEB', 'BOM', 'DEL', '2045', '2300', 'MO', '320', 'VRCI0'], ['´BOOKª', '+TQ', 'D', 'R', '+8'], [], ['PRICING', 'OPTION', '12', 'TOTAL', 'AMOUNT', '40376', 'INR'], ['ADT', 'TAX', 'INCLUDED'], ['1', 'NH', '@', '6431', 'S', '15FEB', 'DEL', 'BOM', '1500', '1715', 'TH', '73H', 'SRCI0'], ['2', 'NH', '830', 'S', '15FEB', 'BOM', 'NRT', '2000', '0715', '+', 'TH', '788', 'SRCI0'], ['3', 'NH', '827', 'W', '19FEB', 'NRT', 'DEL', '1715', '0005', '+', 'MO', '788', 'WRCI0'], ['´BOOKª', '+TQ']]

您可以使用正则表达式:

import re
final_data = [list(filter(lambda x:x, re.split('\s+', i))) for i in data.split('\n')][1:-1]
输出:

[['PRICING', 'OPTION', '11', 'TOTAL', 'AMOUNT', '40009', 'INR'], ['ADT', 'TAX', 'INCLUDED'], ['1', 'UK', '933', 'K', '15FEB', 'DEL', 'BOM', '1515', '1725', 'TH', '320', 'SRCI0'], ['2', 'NH', '830', 'S', '15FEB', 'BOM', 'NRT', '2000', '0715', '+', 'TH', '788', 'SRCI0'], ['3', 'NH', '829', 'V', '19FEB', 'NRT', 'BOM', '1115', '1825', 'MO', '788', 'VRCI0'], ['4', 'UK', '988', 'K', '19FEB', 'BOM', 'DEL', '2045', '2300', 'MO', '320', 'VRCI0'], ['´BOOKª', '+TQ', 'D', 'R', '+8'], [], ['PRICING', 'OPTION', '12', 'TOTAL', 'AMOUNT', '40376', 'INR'], ['ADT', 'TAX', 'INCLUDED'], ['1', 'NH', '@', '6431', 'S', '15FEB', 'DEL', 'BOM', '1500', '1715', 'TH', '73H', 'SRCI0'], ['2', 'NH', '830', 'S', '15FEB', 'BOM', 'NRT', '2000', '0715', '+', 'TH', '788', 'SRCI0'], ['3', 'NH', '827', 'W', '19FEB', 'NRT', 'DEL', '1715', '0005', '+', 'MO', '788', 'WRCI0'], ['´BOOKª', '+TQ']]

为此,您可以使用正则表达式拆分

>>> import re
>>> [re.split(' +',line) for line in a.split('\n')] 
[['PRICING', 'OPTION', '11', 'TOTAL', 'AMOUNT', '40009', 'INR'], ['ADT', 'TAX', 'INCLUDED', ''], ['1', 'UK', '933', 'K', '15FEB', 'DEL', 'BOM', '1515', '1725', 'TH', '320', 'SRCI0'], ['2', 'NH', '830', 'S', '15FEB', 'BOM', 'NRT', '2000', '0715', '+', 'TH', '788', 'SRCI0'], ['3', 'NH', '829', 'V', '19FEB', 'NRT', 'BOM', '1115', '1825', 'MO', '788', 'VRCI0'], ['4', 'UK', '988', 'K', '19FEB', 'BOM', 'DEL', '2045', '2300', 'MO', '320', 'VRCI0'], ['´BOOKª', '+TQ', 'D', 'R', '+8'], [''], ['', 'PRICING', 'OPTION', '12', 'TOTAL', 'AMOUNT', '40376', 'INR'], ['ADT', 'TAX', 'INCLUDED', ''], ['1', 'NH', '@', '6431', 'S', '15FEB', 'DEL', 'BOM', '1500', '1715', 'TH', '73H', 'SRCI0'], ['2', 'NH', '830', 'S', '15FEB', 'BOM', 'NRT', '2000', '0715', '+', 'TH', '788', 'SRCI0'], ['3', 'NH', '827', 'W', '19FEB', 'NRT', 'DEL', '1715', '0005', '+', 'MO', '788', 'WRCI0'], ['´BOOKª', '+TQ', '']]

为此,您可以使用正则表达式拆分

>>> import re
>>> [re.split(' +',line) for line in a.split('\n')] 
[['PRICING', 'OPTION', '11', 'TOTAL', 'AMOUNT', '40009', 'INR'], ['ADT', 'TAX', 'INCLUDED', ''], ['1', 'UK', '933', 'K', '15FEB', 'DEL', 'BOM', '1515', '1725', 'TH', '320', 'SRCI0'], ['2', 'NH', '830', 'S', '15FEB', 'BOM', 'NRT', '2000', '0715', '+', 'TH', '788', 'SRCI0'], ['3', 'NH', '829', 'V', '19FEB', 'NRT', 'BOM', '1115', '1825', 'MO', '788', 'VRCI0'], ['4', 'UK', '988', 'K', '19FEB', 'BOM', 'DEL', '2045', '2300', 'MO', '320', 'VRCI0'], ['´BOOKª', '+TQ', 'D', 'R', '+8'], [''], ['', 'PRICING', 'OPTION', '12', 'TOTAL', 'AMOUNT', '40376', 'INR'], ['ADT', 'TAX', 'INCLUDED', ''], ['1', 'NH', '@', '6431', 'S', '15FEB', 'DEL', 'BOM', '1500', '1715', 'TH', '73H', 'SRCI0'], ['2', 'NH', '830', 'S', '15FEB', 'BOM', 'NRT', '2000', '0715', '+', 'TH', '788', 'SRCI0'], ['3', 'NH', '827', 'W', '19FEB', 'NRT', 'DEL', '1715', '0005', '+', 'MO', '788', 'WRCI0'], ['´BOOKª', '+TQ', '']]

split()。另外,
pandas
可以阅读,您可以尝试(尽管不同的“书”块仍需要手动处理)。这有帮助吗<代码>导入re;为.split('\n')中的行打印[re.split('+',line)]
split()
方法实际上折叠了多个分隔符,因此这应该不是问题。另外,
pandas
可以阅读,您可以尝试(尽管不同的“书”块仍需要手动处理)。这有帮助吗<代码>导入re;为a.split('\n')中的行打印[re.split('+',line)]我不明白它
final_data=[line.split()为数据中的行。split('\n')][1:-1]
产生完全相同的输出。为什么需要正则表达式呢?我不知道它
final_data=[line.split()用于数据中的行。split('\n')][1:-1]
产生完全相同的输出。你为什么要用正则表达式来表达你的意思?