Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用正则表达式分隔文本块-Python_Python_Parsing_Tuples - Fatal编程技术网

使用正则表达式分隔文本块-Python

使用正则表达式分隔文本块-Python,python,parsing,tuples,Python,Parsing,Tuples,我从Stanford解析器获得以下输出: nicaragua president ends visit to finland . nn(ends-3, nicaragua-1) nn(ends-3, president-2) nsubj(visit-4, ends-3) xsubj(finland-6, ends-3) root(ROOT-0, visit-4) aux(finland-6, to-5) xcomp(visit-4, finland-6) guatemala presiden

我从Stanford解析器获得以下输出:

nicaragua president ends visit to finland .

nn(ends-3, nicaragua-1)
nn(ends-3, president-2)
nsubj(visit-4, ends-3)
xsubj(finland-6, ends-3)
root(ROOT-0, visit-4)
aux(finland-6, to-5)
xcomp(visit-4, finland-6)

guatemala president ends visit to tropos .

nn(ends-3, guatemala-1)
nn(ends-3, president-2)
nsubj(visit-4, ends-3)
xsubj(finland-6, ends-3)
root(ROOT-0, visit-4)
aux(tropos-6, to-5)
xcomp(visit-4, tropos-6)

[...]

我必须对这个输出进行分段,以便得到包含句子和所有依赖项列表的元组(如
(句子,[依赖项列表])
每一句话。有人能给我推荐一种用Python实现的方法吗?谢谢!

你可以这样做,尽管对你正在解析的结构来说这可能有些过分。如果你还需要解析依赖项,那么扩展它应该相对容易。我还没有运行这个,甚至没有检查语法如果它不马上起作用,就不要杀我

READ_SENT = 0
PRE_DEPS = 1
DEPS = 2
POST_DEPS = 3
def parse_output(input):
    state = READ_SENT
    results = []
    sent = None
    deps = []
    for line in input.splitlines():
        if state == READ_SENT:
            sent = line
            state = PRE_DEPS
        elif state == PRE_DEPS:
             if line:
                 raise Exception('invalid format')
             else:
                 state = DEPS
         elif state == DEPS:
             if line:
                 deps.append(line)
             else:
                 state = POST_DEPS
         elif state == POST_DEPS:
             if line:
                 raise Exception('invalid format')
             else:
                 results.append((sent, deps))
                 sent = None
                 deps = []
                 state = READ_SENT
    return results