Python解析具有未知类型的缩进C文件

Python解析具有未知类型的缩进C文件,python,c,parsing,Python,C,Parsing,如何解析语法正确的C文件,该文件包含单个函数,但具有未定义的类型?文件自动缩进(4个空格),在每个块关键字下面使用括号,例如 if ( condition1 ) { func1( int hi ); unktype foo; do { if ( condition2 ) goto LABEL_1; } while ( condition3 ); } else { float a = bar(baz,

如何解析语法正确的C文件,该文件包含单个函数,但具有未定义的类型?文件自动缩进(4个空格),在每个块关键字下面使用括号,例如

if ( condition1 )
{
    func1( int hi );
    unktype foo;
    do
    {
        if ( condition2 )
            goto LABEL_1;
    }
    while ( condition3 );
}
else
{
    float a = bar(baz, 0);
LABEL_1:
    int foobar = (int)a;
}
第一行是原型,第二行是“{”。所有行都以\n结尾。最后一行只是“}\n” 这里有很多多对一的转机,而且标签经常超出他们的范围(糟糕,我知道:D) 我只关心结构信息,即块和语句类型。这里是我想要得到的(打印时,为了清晰起见,添加了缩进):

使用条件1、条件2和条件3字符串。其他构造也会起同样的作用。 标签可以丢弃。我还需要包括与任何特殊语句无关的块,比如
Block([…])。
标准C语言Python解析器无法工作(例如pycparser给出语法错误),因为PycParsing包含未知类型,这里有一个解析器将处理示例代码,还有一点(包括对
for
语句的
支持)

这不是一个很好的C解析器。它将if、while和do条件作为嵌套括号中的字符串进行广泛的遍历。但它可能会让你开始提取你感兴趣的内容

import pyparsing as pp

IF, WHILE, DO, ELSE, FOR = map(pp.Keyword, "if while do else for".split())
SEMI, COLON, LBRACE, RBRACE = map(pp.Suppress, ';:{}')

stmt_body = pp.Forward()
single_stmt = pp.Forward()
stmt_block = stmt_body | single_stmt

if_condition = pp.ungroup(pp.nestedExpr('(', ')'))
while_condition = if_condition()
for_condition = if_condition()

if_stmt = pp.Group(IF 
           + if_condition("condition") 
           + stmt_block("bodyTrue")
           + pp.Optional(ELSE + stmt_block("bodyElse"))
           )
do_stmt = pp.Group(DO 
           + stmt_block("body") 
           + WHILE 
           + while_condition("condition")
           + SEMI
           )
while_stmt = pp.Group(WHILE + while_condition("condition")
              + stmt_block("body"))
for_stmt = pp.Group(FOR + for_condition("condition")
            + stmt_block("body"))
other_stmt = (~(LBRACE | RBRACE) + pp.SkipTo(SEMI) + SEMI)
single_stmt <<= if_stmt | do_stmt | while_stmt | for_stmt | other_stmt
stmt_body <<= pp.nestedExpr('{', '}', content=single_stmt)

label = pp.pyparsing_common.identifier + COLON

parser = pp.OneOrMore(stmt_block)
parser.ignore(label)

sample = """
if ( condition1 )
{
    func1( int hi );
    unktype foo;
    do
    {
        if ( condition2 )
            goto LABEL_1;
    }
    while ( condition3 );
}
else
{
    float a = bar(baz, 0);
LABEL_1:
    int foobar = (int)a;
}
"""

print(parser.parseString(sample).dump())

您将不得不猜测,因为实际上不可能在这些约束下明确地解析C。通常可以做出相当不错的猜测,但您仍然需要猜测。例如,
(a)&b
是按位操作还是指针强制转换?谁知道呢!考虑为此编写一个词法分析器。在C语言中,可以忽略写空间字符,只要它们不在字符串中,也不分割标记。你到底在问什么?我的意思是,由于给出的代码不符合目前的C语言,所以现有解析器拒绝它也就不足为奇了。因此,如果需要对其进行解析,则需要修改代码或准备自己的解析器。我怀疑你在追求后者,但在这种情况下,隐含的问题太广泛了。这看起来很有希望,真的很酷!这正是我想要的。非常感谢你,保罗:)
import pyparsing as pp

IF, WHILE, DO, ELSE, FOR = map(pp.Keyword, "if while do else for".split())
SEMI, COLON, LBRACE, RBRACE = map(pp.Suppress, ';:{}')

stmt_body = pp.Forward()
single_stmt = pp.Forward()
stmt_block = stmt_body | single_stmt

if_condition = pp.ungroup(pp.nestedExpr('(', ')'))
while_condition = if_condition()
for_condition = if_condition()

if_stmt = pp.Group(IF 
           + if_condition("condition") 
           + stmt_block("bodyTrue")
           + pp.Optional(ELSE + stmt_block("bodyElse"))
           )
do_stmt = pp.Group(DO 
           + stmt_block("body") 
           + WHILE 
           + while_condition("condition")
           + SEMI
           )
while_stmt = pp.Group(WHILE + while_condition("condition")
              + stmt_block("body"))
for_stmt = pp.Group(FOR + for_condition("condition")
            + stmt_block("body"))
other_stmt = (~(LBRACE | RBRACE) + pp.SkipTo(SEMI) + SEMI)
single_stmt <<= if_stmt | do_stmt | while_stmt | for_stmt | other_stmt
stmt_body <<= pp.nestedExpr('{', '}', content=single_stmt)

label = pp.pyparsing_common.identifier + COLON

parser = pp.OneOrMore(stmt_block)
parser.ignore(label)

sample = """
if ( condition1 )
{
    func1( int hi );
    unktype foo;
    do
    {
        if ( condition2 )
            goto LABEL_1;
    }
    while ( condition3 );
}
else
{
    float a = bar(baz, 0);
LABEL_1:
    int foobar = (int)a;
}
"""

print(parser.parseString(sample).dump())
[['if', 'condition1', ['func1( int hi )', 'unktype foo', ['do', [['if', 'condition2', 'goto LABEL_1']], 'while', 'condition3']], 'else', ['float a = bar(baz, 0)', 'int foobar = (int)a']]]
[0]:
  ['if', 'condition1', ['func1( int hi )', 'unktype foo', ['do', [['if', 'condition2', 'goto LABEL_1']], 'while', 'condition3']], 'else', ['float a = bar(baz, 0)', 'int foobar = (int)a']]
  - bodyElse: ['float a = bar(baz, 0)', 'int foobar = (int)a']
  - bodyTrue: ['func1( int hi )', 'unktype foo', ['do', [['if', 'condition2', 'goto LABEL_1']], 'while', 'condition3']]
    [0]:
      func1( int hi )
    [1]:
      unktype foo
    [2]:
      ['do', [['if', 'condition2', 'goto LABEL_1']], 'while', 'condition3']
      - body: [['if', 'condition2', 'goto LABEL_1']]
        [0]:
          ['if', 'condition2', 'goto LABEL_1']
          - bodyTrue: 'goto LABEL_1'
          - condition: 'condition2'
      - condition: 'condition3'
  - condition: 'condition1'