Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/svg/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 产生带有反斜杠但不包括注释块的连接行_Python_Generator - Fatal编程技术网

Python 产生带有反斜杠但不包括注释块的连接行

Python 产生带有反斜杠但不包括注释块的连接行,python,generator,Python,Generator,当前正在尝试创建一个生成器函数,该函数一次生成一行文件,同时忽略注释块并将末尾带有反斜杠的行连接到下一行。因此,对于这段文字: # this entire line is a comment - don't include it in the output <line0> # this entire line is a comment - don't include it in the output <line1># comment <line2> # thi

当前正在尝试创建一个生成器函数,该函数一次生成一行文件,同时忽略注释块并将末尾带有反斜杠的行连接到下一行。因此,对于这段文字:

# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line
这将产生以下输出:

<line0>
<line1>
<line2>
<line3.1 
line3.2 
line3.3>
<line4.1 
line4.2>
<line5>
more comment2>
<line6>
this line is part of the comment from the previous line

您有两个运算符,
\
。后者优先于前者。这意味着您应该先检查并处理它。以下是使用列表作为缓冲区来建立行的一种简单方法:

def my_generator(f):
    buffer = []
    for line in f:
        line = line.rstrip('\n')
        if line.endswith('\\'):
            buffer.append(line[:-1])
            continue
        line = ''.join(buffer) + line
        buffer = []
        if '#' in line:
            line = line[:line.index('#')]
        if line:
            yield line
包装一个iterable行并使用ducktyping的好处是,您可以传入任何行为类似于字符串容器的内容,而不仅仅是文本文件:

text = """# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line'"""

for line in my_generator(text.splitlines()):
    print(line)

您有两个运算符,
\
\
。后者优先于前者。这意味着您应该先检查并处理它。以下是使用列表作为缓冲区来建立行的一种简单方法:

def my_generator(f):
    buffer = []
    for line in f:
        line = line.rstrip('\n')
        if line.endswith('\\'):
            buffer.append(line[:-1])
            continue
        line = ''.join(buffer) + line
        buffer = []
        if '#' in line:
            line = line[:line.index('#')]
        if line:
            yield line
包装一个iterable行并使用ducktyping的好处是,您可以传入任何行为类似于字符串容器的内容,而不仅仅是文本文件:

text = """# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line'"""

for line in my_generator(text.splitlines()):
    print(line)

我建议使用
re.sub
方法

def line_gen(text: str):

    text = re.sub(r"\s+\\\n", '', text)   # Remove any \ break
    text = re.sub(r"#(.*)\n", '\n', text) # Remove any comment
    # If the last line it is a comment it won't have a final \n.
    # We have to remove it as well.
    text = re.sub(r"#.*", '', text) 

    for line in text.rsplit():  # Using rsplit here we get ride of all unwanted spaces.
        yield line


with open("/tmp/data.txt") as f:
    text = f.read()

    for line in line_gen(text):
        print(line)
data.txt的内容

# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line
#整行都是注释-不要将其包含在输出中
#这整行都是注释-不要将其包含在输出中
#评论
#这整行都是注释-不要将其包含在输出中
#评论\
#更多评论1\
更多评论2>
#这里有一行评论,继续到下一行\
这一行是前一行注释的一部分
结果:

<line0>
<line1>
<line2>
<line3.1line3.2line3.3>
<line4.1line4.2>
<line5>
<line6>

我建议使用
re.sub
方法

def line_gen(text: str):

    text = re.sub(r"\s+\\\n", '', text)   # Remove any \ break
    text = re.sub(r"#(.*)\n", '\n', text) # Remove any comment
    # If the last line it is a comment it won't have a final \n.
    # We have to remove it as well.
    text = re.sub(r"#.*", '', text) 

    for line in text.rsplit():  # Using rsplit here we get ride of all unwanted spaces.
        yield line


with open("/tmp/data.txt") as f:
    text = f.read()

    for line in line_gen(text):
        print(line)
data.txt的内容

# this entire line is a comment - don't include it in the output
<line0>
# this entire line is a comment - don't include it in the output
<line1># comment
<line2>
# this entire line is a comment - don't include it in the output
<line3.1 \
line3.2 \
line3.3>
<line4.1 \
line4.2>
<line5># comment \
# more comment1 \
more comment2>
<line6>
# here's a comment line continued to the next line \
this line is part of the comment from the previous line
#整行都是注释-不要将其包含在输出中
#这整行都是注释-不要将其包含在输出中
#评论
#这整行都是注释-不要将其包含在输出中
#评论\
#更多评论1\
更多评论2>
#这里有一行评论,继续到下一行\
这一行是前一行注释的一部分
结果:

<line0>
<line1>
<line2>
<line3.1line3.2line3.3>
<line4.1line4.2>
<line5>
<line6>



由于某些原因,当我运行代码时,连在一起的行没有任何空格,当我使用
line=''.join(linebuff)+line
时,空格仅出现在
@JimT之后。除非使用不同的文本,否则空格在反斜杠前面的行中。如果你想要最后一个空格,你可以这样做。
'.join(buffer)+'+line
我使用的是相同的文本-这段代码确实会在第3行和第4行中的每个项目后面产生一个空格,但是现在除了第5行之外,每隔一行的开头也会有空格。行之后还有两行空格2@JimT. 我不知道该告诉你什么。我将代码直接复制并粘贴到编辑器中,然后将结果复制并粘贴回来。您是在修改代码中的任何内容,还是手动输入?代码还没有被修改,我不知道将文件读入Python与在代码中包含文本是否有区别?出于某种原因,当我运行代码时,以及当我使用
line=''.join(linebuff)+line
,空格仅出现在
@JimT之后。除非使用不同的文本,否则空格在反斜杠前面的行中。如果你想要最后一个空格,你可以这样做。
'.join(buffer)+'+line
我使用的是相同的文本-这段代码确实会在第3行和第4行中的每个项目后面产生一个空格,但是现在除了第5行之外,每隔一行的开头也会有空格。行之后还有两行空格2@JimT. 我不知道该告诉你什么。我将代码直接复制并粘贴到编辑器中,然后将结果复制并粘贴回来。您是在修改代码中的任何内容,还是手动输入?代码没有被修改,我不知道将文件读入Python与在代码中包含文本是否有区别?您可以使用内置的方法re.sub,这样就可以减少问题,替换两个字符串模式,代码就会更简单,可读性和更好的性能。您可以使用内置方法re.sub,这样可以减少问题,以替换两个字符串模式,您的代码将更简单、可读性更好。当文本在代码中时,此解决方案也非常有效,但当我使用
file\u name=open(path/to/file.txt,'r')
读入文件,然后使用
Lines=file\u name.read()
,在function@JimT. 我开始怀疑你的档案有问题。我可以完美地再现Raydel的结果,但我们的两种解决方案都没有问题。@JimT我调整了解决方案,以便在从文件中获取文本时获得正确的结果。@RaydelMiranda这正是我和疯狂物理学家之前得到的输出,但是理想情况下,如果希望第3行和第4行的输出分别是
的话,在正则表达式中无所事事……
\\\n
会导致
当文本在代码中时,这个解决方案也非常有效,但当我使用
file\u name=open(path/to/file.txt,'r')读取文件时就不行了
然后
Lines=file_name.read()
,在function@JimT. 我开始怀疑你的档案有问题。我可以完美地再现Raydel的结果,但我们的两种解决方案都没有问题。@JimT我调整了解决方案,以便在从文件中获取文本时获得正确的结果。@RaydelMiranda这正是我和疯狂物理学家之前得到的输出,但理想情况下,我们希望第3行和第4行的输出是

<line0>
<line1>
<line2>
<line3.1line3.2line3.3>
<line4.1line4.2>
<line5>
<line6>