Python 两个字符串(句子)之间的差异

Python 两个字符串(句子)之间的差异,python,string,Python,String,我试图计算两个句子之间的差异,如下所示: import difflib text1_lines = "I understand how customers do their choice. Difference" text2_lines = "I understand how customers do their choice." diff = difflib.ndiff(text1_lines, text2_lines) 我想改变一下 但我不明白。我做错了什么?谢谢您让我知道。将较大的字符

我试图计算两个句子之间的差异,如下所示:

import difflib

text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.ndiff(text1_lines, text2_lines)
我想改变一下


但我不明白。我做错了什么?谢谢您让我知道。

将较大的字符串拆分为较小的字符串,您将得到差异

if len(a) == 0:
   print b
   return
if len(b) == 0:
   print a
   return
if len(a)>len(b): 
   res=''.join(a.split(b))             #get diff
else: 
   res=''.join(b.split(a))             #get diff

print(res.strip())     
从:

输出:

*** 
--- 
***************
*** 41,54 ****
c  e  .-  - D- i- f- f- e- r- e- n- c- e--- 41,43 ----
['-  ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']

使用简单的列表理解:

diff = [x for x in difflib.ndiff(text1_lines, text2_lines) if x[0] != ' ']
它将向您显示删除和增补

输出:

*** 
--- 
***************
*** 41,54 ****
c  e  .-  - D- i- f- f- e- r- e- n- c- e--- 41,43 ----
['-  ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']
(后面带负号的所有内容都已删除)

相反,切换
text1_行
text2_行
将产生以下结果:

['+  ', '+ D', '+ i', '+ f', '+ f', '+ e', '+ r', '+ e', '+ n', '+ c', '+ e']
要删除符号,可以转换上述列表:

diff_nl = [x[2] for x in diff]
要完全转换为字符串,只需使用
.join()


使用实际的
difflib
,您可以这样做。问题是你得到了一个生成器,它有点像一个打包的for循环,解包的唯一方法就是迭代它

import difflib
text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.unified_diff(text1_lines, text2_lines)
unified_diff
ndiff
的不同之处在于,它只显示不同之处,而as
ndiff
则显示相似之处和不同之处
diff
现在是一个生成器对象,剩下要做的就是将其解压缩

n = 0
result = ''
for difference in diff:
    n += 1
    if n < 7: # the first 7 lines is a bunch of information unnecessary for waht you want
        continue
    result += difference[1] # the character at this point will either be " x", "-x" or "+x"

你有错误吗?您当前的输出是什么?为什么不使用set difference来计算字符串@henry
splitA=set(a.split(“”)和
splitB=set(b.split(“”))之间的不常见单词的可能重复项是您的意思编辑nvm更改答案非常好!!非常感谢。你可以添加它作为基本条件,更新答案,如果两个字符串都为空,这将打印任何内容。谢谢你的答案。如何获得简单:没有所有“+”符号等的差异?
diff_nl=[x[2]表示diff中的x]
如果你想忽略符号,也许你可以使用set Difference@henry?
diff_nl='.[x[2]表示diff中的x])
表示没有列表的纯字符串。谢谢!谢谢你的回答。我怎样才能明白:“不同”没有所有额外的东西,标志等?
>>> result
' Difference'