Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python解析SQL并查找关系_Python_Sql_Parsing_Pyparsing_Sql Parser - Fatal编程技术网

Python解析SQL并查找关系

Python解析SQL并查找关系,python,sql,parsing,pyparsing,sql-parser,Python,Sql,Parsing,Pyparsing,Sql Parser,我有一个很大的SQL查询列表,都是字符串形式的,它们是为MySQL编写的,所以有点像是为MySQL格式化的 我希望能够梳理出一些查询中编写的表关系 让我们从简单的事情开始: SELECT e.object_id, count(*) FROM schema_name.elements AS e JOIN schema_name2.quotes AS q ON q.id = e.object_id WHERE e.object_type = 'something' GROUP BY

我有一个很大的SQL查询列表,都是字符串形式的,它们是为MySQL编写的,所以有点像是为MySQL格式化的

我希望能够梳理出一些查询中编写的表关系

让我们从简单的事情开始:

SELECT e.object_id, count(*)
FROM schema_name.elements AS e
       JOIN schema_name2.quotes AS q ON q.id = e.object_id
WHERE e.object_type = 'something' 
GROUP BY e.object_id, q.query
ORDER BY 2 desc;
可以清楚地看到事物连接在一起的地方,尽管有别名-因此需要扫描并找到别名-这很好,因为使用了关键字“as”

所以我想为查询返回一个关系列表,每个关系看起来都像这样:

dict = {'SourceSchema': 'schema_name',
'SourceTable': "elements",
'SourceColumn': "object_id",
'TargetSchema': "schema_name2",
'TargetTable': "quotes",
'TargetColumn': "id"}
我可以想象这样做很容易,但事情变得更复杂:

SELECT e.object_id, count(*)
FROM schema_name.elements e
        LEFT JOIN schema_name2.quotes q ON q.id = cast(coalesce(nullif(e.object_id,''),'0') as bigint)
WHERE e.object_type = 'something' 
GROUP BY e.object_id, q.query
ORDER BY 2 desc;
需要注意的三件事

  • 缺少“AS”保留字-可能会使其更难获取
  • 当连接时,需要很多东西来一起解析这两个表
  • 这不是一个简单的“连接”,而是一个左连接

我想知道是否有某种形式的Python SQL解析库可以让我梳理出4000个查询中的关系?如果没有,那么我如何才能有效地做到这一点?我猜我可能需要扫描查询,找到连接,找到别名,然后查看它们是如何连接的,同时考虑到一堆需要丢弃的停止词。

对作为pyparsing示例一部分的select_parser.py()进行一些小的更改,在解析完第一个示例后,我得到了以下结果:

SELECT e.object_id, count(*) FROM schema_name.elements AS e        JOIN schema_name2.quotes AS q ON q.id = e.object_id WHERE e.object_type = 'something' GROUP BY e.object_id, q.query ORDER BY 2 desc;
['SELECT', [['e.object_id'], ['count', '*']], 'FROM', [['schema_name', '.', 'elements'], 'AS', 'e', ['JOIN'], ['schema_name2', '.', 'quotes'], 'AS', 'q', ['ON', ['q.id', '=', 'e.object_id']]], 'WHERE', ['e.object_type', '=', 'something'], 'GROUP', 'BY', [['e.object_id'], ['q.query']], 'ORDER', 'BY', [['2', 'DESC']], ';']
- columns: [['e.object_id'], ['count', '*']]
  [0]:
    ['e.object_id']
  [1]:
    ['count', '*']
- from: [[['schema_name', '.', 'elements'], 'AS', 'e', ['JOIN'], ['schema_name2', '.', 'quotes'], 'AS', 'q', ['ON', ['q.id', '=', 'e.object_id']]]]
  [0]:
    [['schema_name', '.', 'elements'], 'AS', 'e', ['JOIN'], ['schema_name2', '.', 'quotes'], 'AS', 'q', ['ON', ['q.id', '=', 'e.object_id']]]
    - table_alias: [['e'], ['q']]
      [0]:
        ['e']
      [1]:
        ['q']
- order_by_terms: [['2', 'DESC']]
  [0]:
    ['2', 'DESC']
    - direction: DESC
    - order_key: 2
- where_expr: ['e.object_type', '=', 'something']

看起来这个例子可能会帮助您开始。它是以SQLite的SELECT格式编写的,因此您需要扩展一些语法。

感谢您的回复!是的,我认为让pyparsing为我做这件事是最简单的方法。我四处查看了一下,在我发布了以下内容后发现: