Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/opencv/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python比较两个字符串类似于比较Word中的两个文档_Python_Text Analysis - Fatal编程技术网

python比较两个字符串类似于比较Word中的两个文档

python比较两个字符串类似于比较Word中的两个文档,python,text-analysis,Python,Text Analysis,我想比较两个段落的字符级别,看看哪些单词被修改了 - 2 And when he had fasted forty days and forty nights, he was afterward an hungred. + 2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be tempted

我想比较两个段落的字符级别,看看哪些单词被修改了

- 2 And when he had fasted forty days and forty nights, he was afterward an hungred.
+ 2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be
tempted of the devil,
要比较的段落:

t1 = '''1 Then was Jesus led up of the Spirit into the wilderness to be tempted of the devil.
2 And when he had fasted forty days and forty nights, he was afterward an hungred.
3 And when the tempter came to him, he said, If thou be the Son of God, command that these stones be made bread.
'''.splitlines(keepends=True)

t2 = '''1 Then Jesus was led up of the Spirit, into the wilderness, to be with God.
2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be tempted of the devil,
3 And when the tempter came to him, he said, If thou be the Son of God, command that these stones be made bread.
'''.splitlines(keepends=True)
当我尝试
difflib
时,它在第一行运行良好,但在第二行中没有检测到差异

>>> from difflib import *

>>> d = Differ()
>>> result = list(d.compare(t1,t2))
>>> for i in result:
...     print(i, end='')
结果: 只有第一段具有所需的输出


即使我提取第二行进行比较

t1 = '''2 And when he had fasted forty days and forty nights, he was afterward an hungred.
'''.splitlines(keepends=True)

t2 = '''2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be tempted of the devil,
'''.splitlines(keepends=True)
d = Differ()
result = list(d.compare(t1,t2))
for i in result:
    print(i, end='')
结果: 它不会显示正在修改的字符,而是显示正在修改这一行

- 2 And when he had fasted forty days and forty nights, he was afterward an hungred.
+ 2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be
tempted of the devil,

但是如果我用
SequenceMatcher
测试第二行,它似乎可以识别修改过的字符

p2_1 = '''2 And when he had fasted forty days and forty nights, he was afterward an hungred.'''
p2_2 = '''2 And when he had fasted forty days and forty nights, and had communed with God, he was afterwards an hungered, and was left to be tempted of the devil,'''
se = SequenceMatcher(None,p2_1, p2_2)
se.get_opcodes()
结果: 问题: 我如何比较这两段文字,以便知道修改了哪个字符?或者是否有我可以使用的现有软件包

这是我想要的输出 或者类似的

我还是有点困惑。你能举一个你想要的输出的例子吗?@algerbrex嗨,我添加了你想要的输出。也许你的问题有更多的地方,但是你能不能不简单地从
结果中切出你想要的部分并使用它<代码>结果[0:4]
为我提供了所需的输出。
[('equal', 0, 54, 0, 54),
 ('insert', 54, 54, 54, 81),
 ('equal', 54, 70, 81, 97),
 ('insert', 70, 70, 97, 98),
 ('equal', 70, 78, 98, 106),
 ('insert', 78, 78, 106, 107),
 ('equal', 78, 81, 107, 110),
 ('replace', 81, 82, 110, 152)]
- 1 Then was Jesus led up of the Spirit into the wilderness to be tempted of the devil.
?        ----                                                     ^^^^^^^^^^^  -  ----
+ 1 Then Jesus was led up of the Spirit, into the wilderness, to be with God.
?            ++++                      +                    +       ^^   ++