Python 比较两个具有相同名称文本文件和不匹配regex findall的目录_Python_Python 3.x_Regex_Csv_File

Python 比较两个具有相同名称文本文件和不匹配regex findall的目录

python python-3.x regex csv file

Python 比较两个具有相同名称文本文件和不匹配regex findall的目录,python,python-3.x,regex,csv,file,Python,Python 3.x,Regex,Csv,File,我有两个目录，其中有具有相同匹配文件名的文本文件。我需要分别比较相同名称的文本文件，并用相应的文件名写出csv中不匹配的单词 Folder1: file1.txt file2.txt Folder2: file1.txt file2.txt 在这里，我需要从比较这两个词中找到不匹配的词。我有下面这样的东西 stat="files, unmatch_words\n" pack=os.list('./Folder1/') for file in pack: packag

我有两个目录，其中有具有相同匹配文件名的文本文件。我需要分别比较相同名称的文本文件，并用相应的文件名写出csv中不匹配的单词

Folder1:
  file1.txt   file2.txt

Folder2: 
  file1.txt   file2.txt

在这里，我需要从比较这两个词中找到不匹配的词。我有下面这样的东西

stat="files, unmatch_words\n"
pack=os.list('./Folder1/')
for file in pack:
    package=open('./Folder1/' + file, 'r').read()
    grep=open("./Folder2/" + file,'r').read()
    output= re.findall(r'(\w+)',grep)
    rex=(set(output))
    stat += file.replace('.txt', '') + ',"'

    for sem in rex:
        if sem not in package:
            stat += sem + '\n'
stat += '","'
stat += '"' + '\n'
f=open('file.csv', 'w')
f.write(stat)

这将Folder2中的所有文件（即rex）合并为一个文件，并将其与Folder1中的文件进行比较

我想将Folder1->file1.txt中的单词与Folder2->file1.txt中的单词取消匹配，就像wiseFolder1->file2.txt与Folder2中的单词一样

任何人都可以建议这方面的更新。谢谢
我不会亲自使用regex。您可以将单词按空格分隔，将其放入两个列表中，然后检查不匹配的单词：

#get words in those files, splited by whitespaces packageWords=package.split() grepWords=grep.split() #words in package but not in grep unmatchedWordsPackage=[word for word in packageWords if not word in grepWords] #words in grep but not in package unmatchedWordsGrep=[word for word in grepWords if not word in packageWords] #merge the two lists into one big string unmatchedWords=' '.join(unmatchedWordsGrep) + ' ' + ' '.join(unmatchedWordsGrep)

这种功能对于我们来说是微不足道的
例如：
考虑到这两个文件：

% cat file1.txt line 1 line 2 blah bligh blah line 3 line 4 line 5 % cat file2.txt line 1 blip blop bloop line 2 line 3 line 4 line 5
unix diff实用程序显示了这种差异：

% diff file1.txt file2.txt 2,3c2,3 < line 2 < blah bligh blah line 3 --- > blip blop bloop line 2 > line 3
印刷品：

line 1 + blip blop bloop line 2 - line 2 ? ^ + line 3 ? ^ - blah bligh blah line 3 line 4 line 5
仅用正则表达式来复制这一点是不可能的
四个字符的
“，+”，“？”，“-”
允许您以编程方式分隔哪个文件有哪些更改（左或右），然后您可以将这些更改写入csv文件
如果你在difflib中四处闲逛，你很可能会得到你需要的方向
如果只查找逐字的差异，可以使用以下集合：

with open(f1_name) as f1, open(f2_name) as f2: s1={word for line in f1 for word in line.split()} s2={word for line in f2 for word in line.split()} >>> s1-s2 {'blah', 'bligh'} # words only in file1 >>> s2-s1 {'blip', 'blop', 'bloop'} # words only in file2

你必须使用python吗？它可能值得在bash中使用
diff
。也许是有帮助的最好的方法就是@theEpsilon所表达的意思。使用
diff
功能。我相信几乎所有的操作系统都有这个功能。打开终端并尝试运行
diff--help
函数是否存在？我需要使用regex，因为它的内容太大，标点符号太多。你的解决方案不适合我，谢谢你。仍然不可能，因为文件太大了，我需要使用正则表达式来缩短它，稍后再比较。我看到你的评论使用正则表达式，这是不可能的。有没有其他方法可以让我把课文缩短，然后进行比较。
line 1 + blip blop bloop line 2 - line 2 ? ^ + line 3 ? ^ - blah bligh blah line 3 line 4 line 5

with open(f1_name) as f1, open(f2_name) as f2: s1={word for line in f1 for word in line.split()} s2={word for line in f2 for word in line.split()} >>> s1-s2 {'blah', 'bligh'} # words only in file1 >>> s2-s1 {'blip', 'blop', 'bloop'} # words only in file2