Python 如何使用difflib.ndiff忽略行？_Python

Python 如何使用difflib.ndiff忽略行？

python

Python 如何使用difflib.ndiff忽略行？,python,Python,根据文档，您可以提供linejunk函数来忽略证书行。然而，我不能让它工作。下面是一些示例代码供讨论： from re import search from difflib import ndiff t1 = 'one 1\ntwo 2\nthree 3' t2 = 'one 1\ntwo 29\nthree 3' diff = ndiff(t1.splitlines(), t2.splitlines(), lambda x: search('2', x)) 我的意图是忽略第二行，diff将

根据文档，您可以提供linejunk函数来忽略证书行。然而，我不能让它工作。下面是一些示例代码供讨论：

from re import search
from difflib import ndiff 
t1 = 'one 1\ntwo 2\nthree 3'
t2 = 'one 1\ntwo 29\nthree 3'
diff = ndiff(t1.splitlines(), t2.splitlines(), lambda x: search('2', x))

我的意图是忽略第二行，diff将是一个不显示任何差异的生成器

谢谢您的帮助。

您的示例有一个问题：ndiff的前两个参数都应该是字符串列表；您有一个字符串，它被视为一个字符列表。看见使用例如

t1='1\n两个2\n三个3'.splitlines（）

但是，正如下面的示例所示，difflib.ndiff不会为所有行调用linejunk函数。这是一个长期的行为——用Python2.2到2.6以及3.1进行了验证

示例脚本：

from difflib import ndiff
t1 = 'one 1\ntwo 2\nthree 3'.splitlines()
t2 = 'one 1\ntwo 29\nthree 3'.splitlines()
def lj(line):
    rval = '2' in line
    print("lj: line=%r, rval=%s" % (line, rval))
    return rval
d = list(ndiff(t1, t2    )); print("%d %r\n" %  (1, d))
d = list(ndiff(t1, t2, lj)); print("%d %r\n" %  (2, d))
d = list(ndiff(t2, t1, lj)); print("%d %r\n" %  (3, d))

使用Python 2.6运行的输出：

1 ['  one 1', '- two 2', '+ two 29', '?      +\n', '  three 3']

lj: line='one 1', rval=False
lj: line='two 29', rval=True
lj: line='three 3', rval=False
2 ['  one 1', '- two 2', '+ two 29', '?      +\n', '  three 3']

lj: line='one 1', rval=False
lj: line='two 2', rval=True
lj: line='three 3', rval=False
3 ['  one 1', '- two 29', '?      -\n', '+ two 2', '  three 3']

您可能希望将此报告为错误。但是请注意，文档没有明确说明“垃圾”行的含义。你期望的产出是什么

进一步的困惑：在脚本中添加以下行：

t3 = 'one 1\n   \ntwo 2\n'.splitlines()
t4 = 'one 1\n\n#\n\ntwo 2\n'.splitlines()
d = list(ndiff(t3, t4      )); print("%d %r\n" %  (4, d))
d = list(ndiff(t4, t3      )); print("%d %r\n" %  (5, d))
d = list(ndiff(t3, t4, None)); print("%d %r\n" %  (6, d))
d = list(ndiff(t4, t3, None)); print("%d %r\n" %  (7, d))

生成此输出：

4 ['  one 1', '-    ', '+ ', '+ #', '+ ', '  two 2']

5 ['  one 1', '+    ', '- ', '- #', '- ', '  two 2']

6 ['  one 1', '-    ', '+ ', '+ #', '+ ', '  two 2']

7 ['  one 1', '+    ', '- ', '- #', '- ', '  two 2']

换句话说，如果包含不同的“垃圾”行（除初始哈希外的空白），则使用默认linejunk函数的结果与不使用linejunk函数的结果相同

如果你能告诉我们你想要实现什么，我们也许能提出一个替代方法

在获得更多信息后进行编辑

如果您的意图是忽略所有包含“2”的行，即假装它们不存在用于ndiff目的，那么您所要做的就是将假装变为现实：

t1f = [line for line in t1 if '2' not in line]
t2f = [line for line in t2 if '2' not in line]
diff = ndiff(t1f, t2f)

您的示例有一个问题：ndiff的前两个参数都应该是字符串列表；您有一个字符串，它被视为一个字符列表。看见使用例如

t1='1\n两个2\n三个3'.splitlines（）

但是，正如下面的示例所示，difflib.ndiff不会为所有行调用linejunk函数。这是一个长期的行为——用Python2.2到2.6以及3.1进行了验证

示例脚本：

from difflib import ndiff
t1 = 'one 1\ntwo 2\nthree 3'.splitlines()
t2 = 'one 1\ntwo 29\nthree 3'.splitlines()
def lj(line):
    rval = '2' in line
    print("lj: line=%r, rval=%s" % (line, rval))
    return rval
d = list(ndiff(t1, t2    )); print("%d %r\n" %  (1, d))
d = list(ndiff(t1, t2, lj)); print("%d %r\n" %  (2, d))
d = list(ndiff(t2, t1, lj)); print("%d %r\n" %  (3, d))

使用Python 2.6运行的输出：

1 ['  one 1', '- two 2', '+ two 29', '?      +\n', '  three 3']

lj: line='one 1', rval=False
lj: line='two 29', rval=True
lj: line='three 3', rval=False
2 ['  one 1', '- two 2', '+ two 29', '?      +\n', '  three 3']

lj: line='one 1', rval=False
lj: line='two 2', rval=True
lj: line='three 3', rval=False
3 ['  one 1', '- two 29', '?      -\n', '+ two 2', '  three 3']

您可能希望将此报告为错误。但是请注意，文档没有明确说明“垃圾”行的含义。你期望的产出是什么

进一步的困惑：在脚本中添加以下行：

t3 = 'one 1\n   \ntwo 2\n'.splitlines()
t4 = 'one 1\n\n#\n\ntwo 2\n'.splitlines()
d = list(ndiff(t3, t4      )); print("%d %r\n" %  (4, d))
d = list(ndiff(t4, t3      )); print("%d %r\n" %  (5, d))
d = list(ndiff(t3, t4, None)); print("%d %r\n" %  (6, d))
d = list(ndiff(t4, t3, None)); print("%d %r\n" %  (7, d))

生成此输出：

4 ['  one 1', '-    ', '+ ', '+ #', '+ ', '  two 2']

5 ['  one 1', '+    ', '- ', '- #', '- ', '  two 2']

6 ['  one 1', '-    ', '+ ', '+ #', '+ ', '  two 2']

7 ['  one 1', '+    ', '- ', '- #', '- ', '  two 2']

换句话说，如果包含不同的“垃圾”行（除初始哈希外的空白），则使用默认linejunk函数的结果与不使用linejunk函数的结果相同

如果你能告诉我们你想要实现什么，我们也许能提出一个替代方法

在获得更多信息后进行编辑

如果您的意图是忽略所有包含“2”的行，即假装它们不存在用于ndiff目的，那么您所要做的就是将假装变为现实：

t1f = [line for line in t1 if '2' not in line]
t2f = [line for line in t2 if '2' not in line]
diff = ndiff(t1f, t2f)

我最近遇到了同样的问题

以下是我发现的：

比照

*垃圾参数的主要目的是加速匹配到寻找差异，而不是掩盖差异

c、 f。

该补丁对difflib文档中的“垃圾”和“忽略”概念提供了更好的解释

这些垃圾过滤功能可以加速匹配以发现差异不要忽略任何不同的行或字符

我最近遇到了同样的问题

以下是我发现的：

比照

*垃圾参数的主要目的是加速匹配到寻找差异，而不是掩盖差异

c、 f。

该补丁对difflib文档中的“垃圾”和“忽略”概念提供了更好的解释

这些垃圾过滤功能可以加速匹配以发现差异不要忽略任何不同的行或字符

我的意图是忽略第二行，而diff将是一个不显示任何差异的生成器。我最终这样做了。我最初使用的是HtmlDiff函数，我希望在创建html输出时忽略（而不是删除）某些行。我被带到了ndiff，因为文件暗示这两者有关联。我想知道linejunk函数在这一点上到底做了什么。@behindalens:我和你一样好奇。我可以提交错误报告和/或文件澄清请求。我甚至可以阅读源代码：-）。。。同时，你认为你的问题已经被回答了吗？我的意图是忽略第二行，而微分将是一个没有显示任何差异的生成器。我最初使用的是HtmlDiff函数，我希望在创建html输出时忽略（而不是删除）某些行。我被带到了ndiff，因为文件暗示这两者有关联。我想知道linejunk函数在这一点上到底做了什么。@behindalens:我和你一样好奇。我可以提交错误报告和/或文件澄清请求。我甚至可以阅读源代码：-）。。。同时，你认为你的问题已经得到回答了吗？这应该是对这个问题的公认答案。马太瓦特确实提供了有关“为什么”的有价值的信息。但是知道“怎么做”很有趣，另一个答案就是关于这个问题的。这两个答案相辅相成：）这应该是这个问题公认的答案。@Matthewatabit它确实提供了关于“为什么”的有价值的信息。但是知道“怎么做”很有趣，另一个答案就是关于这个问题的。两个答案相辅相成：）