Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/356.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用于解析SQL语句的Python正则表达式_Python_Regex - Fatal编程技术网

用于解析SQL语句的Python正则表达式

用于解析SQL语句的Python正则表达式,python,regex,Python,Regex,我需要使用正则表达式解析SQLDDL语句中的一些信息。SQL语句如下所示: CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT) USING parquet OPTIONS ( serialization.format '1' ) PARTITIONED BY (DATA2, DATA3) 我需要用Python对其进行解析,并拉出PARTITIONED BY子句中命名的列。我

我需要使用正则表达式解析SQLDDL语句中的一些信息。SQL语句如下所示:

CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
USING parquet
OPTIONS (
  serialization.format '1'
)
PARTITIONED BY (DATA2, DATA3)
我需要用Python对其进行解析,并拉出
PARTITIONED BY
子句中命名的列。我已经想出了一个正则表达式,在删除换行符后实现它,但是如果有换行符,我就无法让它工作。下面是一些演示代码:

import re
def print_partition_columns_if_found(ddl_string):
    regex = r'CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING +([^\s]+)( +OPTIONS *\([^)]+\))?( *PARTITIONED BY \((?P<pcol>.*?)\))?'
    match = re.search(regex, ddl_string, re.MULTILINE | re.DOTALL)
    if match.group("pcol"):
        print match.group("pcol").strip()
    else:
        print 'did not find any pcols in {matches}'.format(matches=match.groups())        


ddl_string1 = """
CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
USING parquet OPTIONS (serialization.format '1') PARTITIONED BY (DATA2, DATA3)"""
print_partition_columns_if_found(ddl_string1)

print "--------"

ddl_string2 = """
CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
USING parquet
OPTIONS (
  serialization.format '1'
)
PARTITIONED BY (DATA2, DATA3)
"""
print_partition_columns_if_found(ddl_string2)
重新导入
如果找到def打印分区列(ddl字符串):
regex=r'CREATE+?(TEMPORARY+)?表*(?P.*?\)?(?P.*?)*(?P.*?)*?使用+([^\s]+)(+选项*\([^)]+\)?(*由\(?P.*?)分区)
match=re.search(正则表达式、ddl_字符串、re.MULTILINE | re.DOTALL)
如果匹配组(“pcol”):
打印match.group(“pcol”).strip()
其他:
print“在{matches}中未找到任何PCOL”。格式(matches=match.groups())
ddl_string1=“”
创建表default.table1(数据4 BIGINT、数据5 BIGINT、数据2 BIGINT、数据3 BIGINT)
使用由(DATA2,DATA3)分区的拼花地板选项(serialization.format“1”)
如果找到,则打印分区列(ddl字符串1)
打印“----------”
ddl_string2=“”
创建表default.table1(数据4 BIGINT、数据5 BIGINT、数据2 BIGINT、数据3 BIGINT)
使用拼花地板
选择权(
序列化格式“1”
)
分区者(数据2、数据3)
"""
如果找到,则打印分区列(ddl字符串2)
这将返回:

数据2,数据3
--------
在中未找到任何PCOL(无、“默认值”。、“表1”、“数据4 BIGINT、数据5 BIGINT、数据2 BIGINT、数据3 BIGINT”、“拼花地板”、无、无、无)


任何正则表达式专家愿意帮助我吗?

让我们检查一下python sqlparse文档:


有意思,我会更深入地了解。谢谢
>>> import sqlparse
>>> ddl_string2 = """
... CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
... USING parquet
... OPTIONS (
...   serialization.format '1'
... )
... PARTITIONED BY (DATA2, DATA3)
... """
>>> ddl_string1 = """
... CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
... USING parquet OPTIONS (serialization.format '1') PARTITIONED BY (DATA2, DATA3)"""
>>> def print_partition_columns_if_found(sql):
...     parse = sqlparse.parse(sql)
...     data = next(item for item in reversed(parse[0].tokens) if item.ttype is None)[1]
...     print(data)
...
>>> print_partition_columns_if_found(ddl_string1)
DATA2, DATA3
>>> print_partition_columns_if_found(ddl_string2)
DATA2, DATA3
>>>