Python：字符串通过几个文件进行比较，但只找到了许多可能的文件中的最后一个_Python_Regex_String_String Comparison

Python：字符串通过几个文件进行比较，但只找到了许多可能的文件中的最后一个

python regex string

Python：字符串通过几个文件进行比较，但只找到了许多可能的文件中的最后一个,python,regex,string,string-comparison,Python,Regex,String,String Comparison,我对Python很陌生。我有两种文件要比较第一种类型（dict.txt）的输出为： 1_A 2_B 3_C K P A B C E 第二种类型-1_1h.txt的输出为： 1_A 2_B 3_C K P A B C E 我试图通过使用正则表达式隔离dict.txt中的字母来进行比较（稍后我还会使用字母旁边的数字来了解字母本身在文件中的位置/行），并将此字母与每个11h.txt文件类型中的字母进行比较但我有一个问题：它不能识别所有匹配的表达式，只能识别一个。。。。为什么？在本例中有两

我对Python很陌生。我有两种文件要比较

第一种类型（dict.txt）的输出为：

1_A
2_B
3_C

K
P
A
B
C
E

第二种类型-1_1h.txt的输出为：

1_A
2_B
3_C

K
P
A
B
C
E

我试图通过使用正则表达式隔离dict.txt中的字母来进行比较（稍后我还会使用字母旁边的数字来了解字母本身在文件中的位置/行），并将此字母与每个11h.txt文件类型中的字母进行比较

但我有一个问题：它不能识别所有匹配的表达式，只能识别一个。。。。为什么？在本例中有两个匹配项：“K”和“C”，但输出仅显示“C”和许多空格。。。这是我的密码：

import os
import re
import fileinput

dict_file = open("C:\\Users\\KP\\Desktop\\test\\dict.txt", "r")
dictionary = dict_file.read().split('\n')
#print lines
#print len(lines)
dict_file.close()


for file in os.listdir('C:\\Users\\KP\\Desktop\\test'):
    if file == '1_1h.txt':        
        open(file) 

        for w in dictionary:
            regex = re.compile('(\d)_(.*)')
            res = regex.search(w)
            if res:
               nb_w = int(res.group(1))
               content_w = str(res.group(2))

            for line in fileinput.input(["1_1h.txt"]): 
                print(content_w+"-->"+line)
                if str(line) == str(content_w):
                    print('match '+line)

输出：

runfile('C:/Users/KP/Desktop/test/testlocale.py', wdir='C:/Users/KP/Desktop/test')
F-->K

F-->J

F-->C
K-->K

K-->J

K-->C
C-->K

C-->J

C-->C
match C

runfile('C:/Users/KP/Desktop/test/test4loop.py', wdir='C:/Users/KP/Desktop/test')
0
1
A A
####
0
2
B B
####
0
3
C C
####
[[ 0.  1.  1.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

search

只返回一个匹配项

使用而不是

搜索

：

res = re.findall('(\d)_(.*)', w)

从文档中：

以字符串形式返回模式的所有非重叠匹配项，作为串。字符串从左到右扫描，并返回匹配项按照找到的顺序。如果模式中存在一个或多个组，返回组列表；这将是一个元组列表，如果模式有多个组。结果中包含空匹配项除非他们碰上另一场比赛的开始

查找错误：可以在读取

file = '1_1h.txt'

它的输出并不是简单地解读为：

K
P
A
B
C
E

但是——我不知道为什么——它被解读为：

K

P

A

B

C

E

出于这个原因，即使我的字典文件中有一个B C，我也能在我的文件1_1h.txt中将C识别为匹配的单词，因为它是唯一一个没有“\n”的单词

这就是我的代码：

import os
import re
import fileinput
import numpy as np

matrix = np.zeros(shape=(10,10))


dict_file = open("C:\\Users\\KP\\Desktop\\test\\dict.txt", "r")
dictionary = dict_file.read().split('\n')
dict_file.close()


for file in os.listdir('C:\\Users\\KP\\Desktop\\test'):
    if file == '1_1h.txt':
        regex = re.compile('(\d)_(.*)')
        res = regex.search(file)
        if res:
            nb_file = int(res.group(1))-1 

        filename = file
        #if os.path.isfile(file):
        open(file) 
        for line in fileinput.input([filename]):
            line = line.replace("\n", "")
            for w in dictionary:
                test = w.split('_',1)            
                if line == test[1]:
                    print nb_file                    
                    print test[0]
                    print str(line)+" "+test[1]
                    print '####'
                    matrix[nb_file,test[0]] = 1


print matrix

输出：

runfile('C:/Users/KP/Desktop/test/testlocale.py', wdir='C:/Users/KP/Desktop/test')
F-->K

F-->J

F-->C
K-->K

K-->J

K-->C
C-->K

C-->J

C-->C
match C

runfile('C:/Users/KP/Desktop/test/test4loop.py', wdir='C:/Users/KP/Desktop/test')
0
1
A A
####
0
2
B B
####
0
3
C C
####
[[ 0.  1.  1.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

您好，谢谢，我收到了这个错误res=re.findall（w，regex）文件“C:\Python27\lib\re.py”，在findall return\u compile（pattern，flags）中的第181行。findall（string）TypeError:expected string或buffer我仍然收到这个错误，我不知道如何处理它

res=re.findall（w，regex）文件“C:\Python27\lib\re.py”，第181行，在findall return _compile（pattern，flags）.findall（string）TypeError:expected string或buffer

中，我尝试进行测试，但在没有findall解决方案的情况下，它也可以工作。。。所以我认为错误在代码中的某个地方，但我不知道在哪里<代码>导入重数组=[1,2,3]数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组重数组输出：

匹配2匹配3

已编辑，请立即尝试。您需要的是模式而不是编译的正则表达式，my Bad另一个错误：

文件“C:/Users/KP/Desktop/test/test4loop.py”，第29行，在nb_w=int（res.group（1））-1 AttributeError:'list'对象没有属性'group'