Python 显示两个数据帧列中两个字符串之间的差异的位置，如_Python_Pandas

Python 显示两个数据帧列中两个字符串之间的差异的位置，如

python pandas

Python 显示两个数据帧列中两个字符串之间的差异的位置，如,python,pandas,Python,Pandas,我正在寻找解决方案，以显示字符串中两列之间的差异所在的位置 Input: df=pd.DataFrame({'A':['this is my favourite one','my dog is the best'], 'B':['now is my favourite one','my doggy is the worst']}) expected output: [A-B],[B-A] 0:4 ,0:3 #'this','now' 3:6 ,3:8

我正在寻找解决方案，以显示字符串中两列之间的差异所在的位置

Input:
df=pd.DataFrame({'A':['this is my favourite one','my dog is the best'],
                 'B':['now is my favourite one','my doggy is the worst']})

expected output:
[A-B],[B-A]
0:4 ,0:3      #'this','now'
3:6 ,3:8      #'dog','doggy'
14:18,16:21   #'best','worst'

现在我只有一种方法来寻找差异（但不起作用，不知道为什么）

你的问题很不寻常，正如评论中提到的，最好使用

difflib.Sequencematcher.get_matching_blocks

，但我无法让它工作。所以这里有一个可行的解决方案，它不会在速度方面执行，但会得到输出

首先我们得到单词的差异，然后我们在每列中找到起始+结束位置：

def get_diff_words(col1, col2):
    diff_words = [[w1, w2] for w1, w2 in zip(col1, col2) if w1 != w2]

    return diff_words

df['diff_words'] = df.apply(lambda x: get_diff_words(x['A'].split(), x['B'].split()), axis=1)
df['pos_A'] = df.apply(lambda x: [f'{x["A"].find(word[0])}:{x["A"].find(word[0])+len(word[0])}' for word in x['diff_words']], axis=1)
df['pos_B'] = df.apply(lambda x: [f'{x["B"].find(word[1])}:{x["B"].find(word[1])+len(word[1])}' for word in x['diff_words']], axis=1)

输出

                          A                        B                     diff_words         pos_A         pos_B
0  this is my favourite one  now is my favourite one                  [[this, now]]         [0:4]         [0:3]
1        my dog is the best    my doggy is the worst  [[dog, doggy], [best, worst]]  [3:6, 14:18]  [3:8, 16:21]

试试difflib hi，difflib不会像我需要的那样显示字符串中的位置，只会用“^”赋值，或者可能我错了？我只是需要它，就像在我预期的输出中，它有返回位置的函数，所以字符串总是有相同的字数？Florian，那些字符串总是大小不同。这假设句子总是有相同的字数；如果是这样的话，那么您首先要如何进行比较？那么比较位置的全部原因是什么？@Erfan，我正在使用python 2.7，还有更兼容的吗？有时这些句子有不同的长度，我的方法是从两个句子中删除公共部分，然后用原始字符串检查其余部分的位置。我建议用更好的示例数据框问一个新问题，更好地展示您的实际数据集。如果你把这个问题链接到这里，我会再次尝试回答。我认为这个问题已经得到了回答，因为它根据您的示例数据生成了正确的输出。此外，您是否知道今年年底将推出

python2.7

@sygneto@Erfan我知道2.7，但我使用的库只在旧版本的python上工作，这就是我使用它的原因。

                          A                        B                     diff_words         pos_A         pos_B
0  this is my favourite one  now is my favourite one                  [[this, now]]         [0:4]         [0:3]
1        my dog is the best    my doggy is the worst  [[dog, doggy], [best, worst]]  [3:6, 14:18]  [3:8, 16:21]