在Python中,如何排除包含任何模式列表的文件行?
我有一个文件,我想从中删除包含特定图案的每一行。假设模式如下所示:在Python中,如何排除包含任何模式列表的文件行?,python,pattern-matching,list-comprehension,lines,Python,Pattern Matching,List Comprehension,Lines,我有一个文件,我想从中删除包含特定图案的每一行。假设模式如下所示: lineRemovalPatterns = [ "!DOCTYPE html", "<html", "<head", "<meta", "<title", "<link rel>", "</head>", "<body>", "</body>", "</html>"
lineRemovalPatterns = [
"!DOCTYPE html",
"<html",
"<head",
"<meta",
"<title",
"<link rel>",
"</head>",
"<body>",
"</body>",
"</html>"
]
您可以使用而不是lineRemovalPattern not in line
来排除包含要删除的子字符串的行
不过,我还是要附和@doctorlove,因为真正的DOM解析器可能会更好地为您服务。这条路不要走太远 以下方法使用函数
any
返回值的求反,该函数应用于涉及当前行和模式列表的列表理解:
# Create a variable for resultant Git file content.
HTMLGitFileContent = ""
HTMLSVNFileName = "README_SVN.html"
HTMLGitFileName = "README.html"
# Loop over the lines of the HTML SVN file, building the resultant Git file
# content. If any of the line removal patterns are in a line, remove that
# line.
HTMLSVNFile = open(HTMLSVNFileName, "r")
for line in HTMLSVNFile:
if not any(pattern in line for pattern in lineRemovalPatterns):
HTMLGitFileContent = HTMLGitFileContent + line
HTMLGitFile = open(HTMLGitFileName, "w")
HTMLGitFile.write(HTMLGitFileContent)
你真的想解析html吗?不,我只是想删除任何包含特定标记的行。我有一个文件的降价文件。这将转换为独立HTML以在SVN存储库web界面中显示,然后将独立HTML裁剪以在Git存储库web界面中显示。对于SVN存储库,呈现完整的HTML文件;对于Git存储库,HTML被放置在现有网页中,然后呈现。因此,我正在删除某些标记(例如,
),以对抗Git repository web呈现机制中的错误。HTML标记可以跨越多行。通常情况下,这是正确的。对于我的情况,我已经设置好了,这样他们就可以在单独的线路上。我认为,这更多的是关于如何编写列表理解,而不是关于HTML。哈哈,是的,你可能是对的。我会小心踩的!
# Create a variable for resultant Git file content.
HTMLGitFileContent = ""
HTMLSVNFileName = "README_SVN.html"
HTMLGitFileName = "README.html"
# Loop over the lines of the HTML SVN file, building the resultant Git file
# content. If any of the line removal patterns are in a line, remove that
# line.
HTMLSVNFile = open(HTMLSVNFileName, "r")
for line in HTMLSVNFile:
if not any(pattern in line for pattern in lineRemovalPatterns):
HTMLGitFileContent = HTMLGitFileContent + line
HTMLGitFile = open(HTMLGitFileName, "w")
HTMLGitFile.write(HTMLGitFileContent)