用于复杂字符串的Python正则表达式

用于复杂字符串的Python正则表达式,python,regex,Python,Regex,下面我有一个字符串,我想使用分隔符空格分割它,这样双qouted值中的空格就被忽略了。例如: string = '3e656b8e06c176 el-s3-log-file [24/Dec/2014:11:54:18 +0000] 202.141.245.38 arn:aws:iam::xxxxx:user/xyz E27FFBA2CA3D61F3 REST.GET.OBJECT logs/2014-12-23-09-25-19-E39257 "GET /el-s36 HTTP/1.1" 2

下面我有一个字符串,我想使用分隔符空格分割它,这样双qouted值中的空格就被忽略了。例如:

string = '3e656b8e06c176 el-s3-log-file [24/Dec/2014:11:54:18 +0000] 
202.141.245.38 arn:aws:iam::xxxxx:user/xyz E27FFBA2CA3D61F3 REST.GET.OBJECT 
logs/2014-12-23-09-25-19-E39257 "GET /el-s36 HTTP/1.1" 200 - 660 660 30 30 
"https://s3-console-us-standard/Console.html?region&locale=en" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" -'

values = string.split(' ')
上面也拆分了两个qouted值<代码>示例:[''GET','/el-s36','HTTP/1.1']


想要一个忽略双qoutes中空格的正则表达式,并且
[]

使用
\s
而不是空格来匹配换行符。也就是说,它将根据空格和换行符分割输入

>>> re.split(r'\s+(?=(?:"[^"]*"|[^"])*$)', string)
['3e656b8e06c176', 'el-s3-log-file', '[24/Dec/2014:11:54:18', '+0000]', '202.141.245.38', 'arn:aws:iam::xxxxx:user/xyz', 'E27FFBA2CA3D61F3', 'REST.GET.OBJECT', 'logs/2014-12-23-09-25-19-E39257', '"GET /el-s36 HTTP/1.1"', '200', '-', '660', '660', '30', '30', '"https://s3-console-us-standard/Console.html?region&locale=en"', '"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"', '-']
>>> re.split(r'\s+(?=(?:"[^"]*"|[^"])*$)(?![^\[\]]*\])', string)
['3e656b8e06c176', 'el-s3-log-file', '[24/Dec/2014:11:54:18 +0000]', '202.141.245.38', 'arn:aws:iam::xxxxx:user/xyz', 'E27FFBA2CA3D61F3', 'REST.GET.OBJECT', 'logs/2014-12-23-09-25-19-E39257', '"GET /el-s36 HTTP/1.1"', '200', '-', '660', '660', '30', '30', '"https://s3-console-us-standard/Console.html?region&locale=en"', '"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"', '-']
更新:

这将根据双引号或
[]
中不存在的空格拆分输入字符串

>>> re.split(r'\s+(?=(?:"[^"]*"|[^"])*$)(?![^\[\]]*\])', string)
['3e656b8e06c176', 'el-s3-log-file', '[24/Dec/2014:11:54:18 +0000]', '202.141.245.38', 'arn:aws:iam::xxxxx:user/xyz', 'E27FFBA2CA3D61F3', 'REST.GET.OBJECT', 'logs/2014-12-23-09-25-19-E39257', '"GET /el-s36 HTTP/1.1"', '200', '-', '660', '660', '30', '30', '"https://s3-console-us-standard/Console.html?region&locale=en"', '"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"', '-']

(更新为适当数量的贪婪)

感谢您的回答,但有一个例外。[]内的值也需要忽略空间分割。示例“[24/Dec/2014:11:54:18+0000]”将被视为单个值。谢谢。工作完美。不分割:200 60 30固定,在报价中过于贪婪。