Python 将文件中的文本块剥离到新文件中_Python_Regex_Python 3.x

Python 将文件中的文本块剥离到新文件中

python regex python-3.x

Python 将文件中的文本块剥离到新文件中,python,regex,python-3.x,Python,Regex,Python 3.x,我需要帮助将文本块从一个文件分割成单独的文件例如： ltm data-group internal /Common/www_web { records { /images { } /images/ { } /test/common/ { } } type string } ltm monitor http /Common/data { adaptive disabled defaults-from /Com

我需要帮助将文本块从一个文件分割成单独的文件

例如：

ltm data-group internal /Common/www_web {
    records {
        /images { }
        /images/ { }
        /test/common/ { }
    }
    type string
}
ltm monitor http /Common/data {
    adaptive disabled
    defaults-from /Common/http
    destination *:*
    interval 1
    ip-dscp 0
    recv "\{\"status\":\"UP\""
    recv-disable "\{\"status\":\"DOWN\""
    send {}
    time-until-up 0
    timeout 4
}
ltm profile http /Common/stage {
    adaptive disabled
    defaults-from /Common/http
    destination *:*
    interval 5
    ip-dscp 0
    recv "\{\"status\":\"UP\""
    recv-disable "\{\"status\":\"DOWN\""
    send "GET /proxy/test HTTP/1.1\r\nHost: staging\r\nConnection: close\r\n\r\n"
    time-until-up 0
    timeout 16
}

我想去掉每个块，将其写入一个单独的文件，例如：

ltm data-group internal /Common/www_web {
    records {
        /images { }
        /images/ { }
        /test/common/ { }
    }
    type string
}

放入一个单独的文件中

ltm monitor http /Common/data {
    adaptive disabled
    defaults-from /Common/http
    destination *:*
    interval 1
    ip-dscp 0
    recv "\{\"status\":\"UP\""
    recv-disable "\{\"status\":\"DOWN\""
    send {}
    time-until-up 0
    timeout 4
}

并将上面的块分为单独的块等。到目前为止，我正试图找到一个正则表达式来实现这一点，下面是我的代码：

#!/usr/bin/python
import sys
import re
with open(sys.argv[1], 'r') as f:
    contents = f.read()

regex = ur"(^ltm[\s\S]+^ltm)"
matches = re.search(regex, contents, re.MULTILINE)

if matches:
    print ("{match}".format(start = matches.start(), end = matches.end(), match = matches.group()))

到目前为止，这个正则表达式捕获了“ltm”文本中的所有内容。任何帮助都将不胜感激

我对此进行了调查，但对我的情况没有多大帮助

您可以使用简单的列表理解：

blocks = ["ltm " + s for s in re.split("^ltm ", contents)[1:]]

您还可以将正则表达式用于（但效率要低得多）：

或与：

您也可以使用（克隆

str.index（）

，在未找到子字符串的情况下不会引发异常），在不使用正则表达式的情况下实现这一点：

我不知道你给每个文件取什么名字可以更改此设置以满足您的需要，但这可能会有所帮助。脚本逐行扫描以查找“itm”，如果找到，将使用下一个块计数的名称创建一个新文件

def save_file_name(val):
    """returns a file named after the block count"""
    return f'block_{val}.txt'

# opens and stores said file.
file = open('ltm.txt', 'r')

# starting count.
count = 0

# The file that will contain each block.
new_file = open(save_file_name(str(count)), 'w')

# As @Olvin Roght had pointed out, its better to read
# the file line by line in this fashion.
for line in file:
    # The part that scans for the wanted keyword.
    if line.startswith('ltm'):
        # If True will fire this set of code.
        # add one to count.
        count += 1
        # Close the now finished file.
        new_file.close()
        # create and store a new file.
        # Important to note to turn count into a string
        new_file = open(save_file_name(str(count)), 'w')
        # write the line to the file.
        new_file.write(line)
    else:
        # And If not the wanted keyword, just write
        # to the still open file
        new_file.write(line)

# Always remeber to close the files that you open.
new_file.close()
file.close()

您可以为文件中的行编写更短的：

。另外，最好检查“ltm”
之后是否有空格，以及是否在新行的开头。另外一个可以提高代码可读性的小编辑是重命名函数save_file（）
。是的，很好，我应该检查是否有空格，但我只能看OP发布的内容。而且，“对于文件中的行”，我认为您需要“readlines（）”来逐行读取一个文本文件。有一些对您有用。file.readlines（）
相当于list（file）
，这意味着它读取整个文件并按换行将其拆分为一个列表。所以，当您为文件中的行写入。readlines（）
时，您会立即读取所有文件并使用列表，但为文件中的行写入时，您会逐行读取文件。您的权利，我已经更正了我的代码，谢谢，我也学到了一些东西。我希望这有助于@Script\u瘾君子。
blocks = re.findall("(^ltm ((?!^ltm ).)*)", contents, re.DOTALL)

delimiter = "ltm "
blocks = []
index = contents.find(delimiter)
while index >= 0:
    new_index = contents.find(delimiter, index + 1)
    if not index or contents[index - 1] in {"\r", "\n"}:
        blocks.append(contents[index: new_index] if new_index > 0 else contents[index: ])
    index = new_index

def save_file_name(val):
    """returns a file named after the block count"""
    return f'block_{val}.txt'

# opens and stores said file.
file = open('ltm.txt', 'r')

# starting count.
count = 0

# The file that will contain each block.
new_file = open(save_file_name(str(count)), 'w')

# As @Olvin Roght had pointed out, its better to read
# the file line by line in this fashion.
for line in file:
    # The part that scans for the wanted keyword.
    if line.startswith('ltm'):
        # If True will fire this set of code.
        # add one to count.
        count += 1
        # Close the now finished file.
        new_file.close()
        # create and store a new file.
        # Important to note to turn count into a string
        new_file = open(save_file_name(str(count)), 'w')
        # write the line to the file.
        new_file.write(line)
    else:
        # And If not the wanted keyword, just write
        # to the still open file
        new_file.write(line)

# Always remeber to close the files that you open.
new_file.close()
file.close()