Python 如何分割线的某些部分？_Python_Loops_Split

Python 如何分割线的某些部分？

python loops

Python 如何分割线的某些部分？,python,loops,split,Python,Loops,Split,因此，我有一系列的生产线（产品），如以下任一： VBD -> 'RATTLED' PP -> CC PP|<PP-LOC-CC-PP> A -> 'B' 或 A、 B、C、D和E可以是任何东西（数字、字母、符号等）如果要将每一行拆分为一个头部（例如VBD）和一个尾部（例如'cratted'），一种简单的方法是使用拆分操作符，如下所示： for line in lines: split_line = line.split(" -> ") he

因此，我有一系列的生产线（产品），如以下任一：

VBD -> 'RATTLED'
PP -> CC PP|<PP-LOC-CC-PP>

A -> 'B'

或

A、 B、C、D和E可以是任何东西（数字、字母、符号等）

如果要将每一行拆分为一个头部（例如

VBD

）和一个尾部（例如

'cratted'

），一种简单的方法是使用

拆分操作符，如下所示：
for line in lines:
    split_line = line.split(" -> ")
    head = split_line[0]
    tail = split_line[1]

这假设每一行只有一个“->”，并且每一行在“->”分隔符的每一侧都有一个空间
我不确定我是否理解您的实现的细节，但是如果您想检查任何给定的尾部是否像“嘎嘎”或像CC PP |
，您可以像这样迭代尾部：
for token in tail:
    if token[0] == "'":
        # this is a string, like 'B'
    else:
        # this is like D E etc.
        two_part_style_split = token.split(' ')

two\u part\u style\u split
看起来像这样：
['CC', 'PP<PP-LOC-CC-PP>']

['CC'，'PP']
您可以使用正则表达式来分解各个部分。我在第二步中去掉了空格，以避免正则表达式看起来太糟糕
import re

tests = ["VBD -> 'RATTLED'", "PP -> CC PP|<PP-LOC-CC-PP>"]

# use positive lookahead to find everything before ->,
# then everything between -> and (optional) |
# and everything after |
split_re = re.compile(r"(.*(?=->))->([^|]+)\|?(.*)?")

def parse(txt):
    # pull out the values then strip any surrounding whitespace
    return (t.strip() for t in split_re.match(txt).groups())

for test in tests:
    a, b, c = parse(test)
    print a,b,c

它没有捕捉到第二种模式中RHS的两个部分@sabzdarsabz好吧，我想也许LHS和RHS的意思是“左手边”和“右手边”？这让我有点困惑，我不确定人们会不会马上明白，特别是考虑到你正在解析的文本非常密集和混乱。@sabzdarsabz我假设你所说的“两部分”是指被管道隔开的两部分？如果是的话，我已经更新了它，这样它就可以工作了。是的，右手边和左手边。在第一个模式中，我关心LHS和“”中的字符串，而在第二个模式中，我关心LHS和RHS0以及RHS1@sabzdarsabz这现在起作用了——我不完全确定你是如何定义尾巴的上半部分和下半部分的。
import re

tests = ["VBD -> 'RATTLED'", "PP -> CC PP|<PP-LOC-CC-PP>"]

# use positive lookahead to find everything before ->,
# then everything between -> and (optional) |
# and everything after |
split_re = re.compile(r"(.*(?=->))->([^|]+)\|?(.*)?")

def parse(txt):
    # pull out the values then strip any surrounding whitespace
    return (t.strip() for t in split_re.match(txt).groups())

for test in tests:
    a, b, c = parse(test)
    print a,b,c

split_re = re.compile(r"\s*(.*(?=\s*->))\s*->\s*(.*(?=\s*(?:\|)?)+)\s*\|?\s*(.*)?")
for test in tests:
    a, b, c = split_re.match(test).group()
    print a,b,c