Python 两个字符串(句子)之间的差异
我试图计算两个句子之间的差异,如下所示:Python 两个字符串(句子)之间的差异,python,string,Python,String,我试图计算两个句子之间的差异,如下所示: import difflib text1_lines = "I understand how customers do their choice. Difference" text2_lines = "I understand how customers do their choice." diff = difflib.ndiff(text1_lines, text2_lines) 我想改变一下 但我不明白。我做错了什么?谢谢您让我知道。将较大的字符
import difflib
text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.ndiff(text1_lines, text2_lines)
我想改变一下
但我不明白。我做错了什么?谢谢您让我知道。将较大的字符串拆分为较小的字符串,您将得到差异
if len(a) == 0:
print b
return
if len(b) == 0:
print a
return
if len(a)>len(b):
res=''.join(a.split(b)) #get diff
else:
res=''.join(b.split(a)) #get diff
print(res.strip())
从:
输出:
***
---
***************
*** 41,54 ****
c e .- - D- i- f- f- e- r- e- n- c- e--- 41,43 ----
['- ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']
使用简单的列表理解:
diff = [x for x in difflib.ndiff(text1_lines, text2_lines) if x[0] != ' ']
它将向您显示删除和增补
输出:
***
---
***************
*** 41,54 ****
c e .- - D- i- f- f- e- r- e- n- c- e--- 41,43 ----
['- ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']
(后面带负号的所有内容都已删除)
相反,切换text1_行
和text2_行
将产生以下结果:
['+ ', '+ D', '+ i', '+ f', '+ f', '+ e', '+ r', '+ e', '+ n', '+ c', '+ e']
要删除符号,可以转换上述列表:
diff_nl = [x[2] for x in diff]
要完全转换为字符串,只需使用.join()
:
使用实际的
difflib
,您可以这样做。问题是你得到了一个生成器,它有点像一个打包的for循环,解包的唯一方法就是迭代它
import difflib
text1_lines = "I understand how customers do their choice. Difference"
text2_lines = "I understand how customers do their choice."
diff = difflib.unified_diff(text1_lines, text2_lines)
unified_diff
与ndiff
的不同之处在于,它只显示不同之处,而asndiff
则显示相似之处和不同之处diff
现在是一个生成器对象,剩下要做的就是将其解压缩
n = 0
result = ''
for difference in diff:
n += 1
if n < 7: # the first 7 lines is a bunch of information unnecessary for waht you want
continue
result += difference[1] # the character at this point will either be " x", "-x" or "+x"
你有错误吗?您当前的输出是什么?为什么不使用set difference来计算字符串@henry
splitA=set(a.split(“”)和splitB=set(b.split(“”))之间的不常见单词的可能重复项是您的意思编辑nvm更改答案非常好!!非常感谢。你可以添加它作为基本条件,更新答案,如果两个字符串都为空,这将打印任何内容。谢谢你的答案。如何获得简单:没有所有“+”符号等的差异?diff_nl=[x[2]表示diff中的x]
如果你想忽略符号,也许你可以使用set Difference@henry?diff_nl='.[x[2]表示diff中的x])
表示没有列表的纯字符串。谢谢!谢谢你的回答。我怎样才能明白:“不同”没有所有额外的东西,标志等?
>>> result
' Difference'