Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/wordpress/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中,如何排除包含任何模式列表的文件行?_Python_Pattern Matching_List Comprehension_Lines - Fatal编程技术网

在Python中,如何排除包含任何模式列表的文件行?

在Python中,如何排除包含任何模式列表的文件行?,python,pattern-matching,list-comprehension,lines,Python,Pattern Matching,List Comprehension,Lines,我有一个文件,我想从中删除包含特定图案的每一行。假设模式如下所示: lineRemovalPatterns = [ "!DOCTYPE html", "<html", "<head", "<meta", "<title", "<link rel>", "</head>", "<body>", "</body>", "</html>"

我有一个文件,我想从中删除包含特定图案的每一行。假设模式如下所示:

lineRemovalPatterns = [
    "!DOCTYPE html",
    "<html",
    "<head",
    "<meta",
    "<title",
    "<link rel>",
    "</head>",
    "<body>",
    "</body>",
    "</html>"
]
您可以使用而不是
lineRemovalPattern not in line
来排除包含要删除的子字符串的行


不过,我还是要附和@doctorlove,因为真正的DOM解析器可能会更好地为您服务。这条路不要走太远

以下方法使用函数
any
返回值的求反,该函数应用于涉及当前行和模式列表的列表理解:

# Create a variable for resultant Git file content.
HTMLGitFileContent = ""
HTMLSVNFileName = "README_SVN.html"
HTMLGitFileName = "README.html"
# Loop over the lines of the HTML SVN file, building the resultant Git file
# content. If any of the line removal patterns are in a line, remove that
# line.
HTMLSVNFile = open(HTMLSVNFileName, "r")
for line in HTMLSVNFile:
    if not any(pattern in line for pattern in lineRemovalPatterns):
        HTMLGitFileContent = HTMLGitFileContent + line
HTMLGitFile = open(HTMLGitFileName, "w")
HTMLGitFile.write(HTMLGitFileContent)

你真的想解析html吗?不,我只是想删除任何包含特定标记的行。我有一个文件的降价文件。这将转换为独立HTML以在SVN存储库web界面中显示,然后将独立HTML裁剪以在Git存储库web界面中显示。对于SVN存储库,呈现完整的HTML文件;对于Git存储库,HTML被放置在现有网页中,然后呈现。因此,我正在删除某些标记(例如,
),以对抗Git repository web呈现机制中的错误。HTML标记可以跨越多行。通常情况下,这是正确的。对于我的情况,我已经设置好了,这样他们就可以在单独的线路上。我认为,这更多的是关于如何编写列表理解,而不是关于HTML。哈哈,是的,你可能是对的。我会小心踩的!
# Create a variable for resultant Git file content.
HTMLGitFileContent = ""
HTMLSVNFileName = "README_SVN.html"
HTMLGitFileName = "README.html"
# Loop over the lines of the HTML SVN file, building the resultant Git file
# content. If any of the line removal patterns are in a line, remove that
# line.
HTMLSVNFile = open(HTMLSVNFileName, "r")
for line in HTMLSVNFile:
    if not any(pattern in line for pattern in lineRemovalPatterns):
        HTMLGitFileContent = HTMLGitFileContent + line
HTMLGitFile = open(HTMLGitFileName, "w")
HTMLGitFile.write(HTMLGitFileContent)