Python 在多个文件中使用正则表达式计算多个字符串_Python_Regex_Count

Python 在多个文件中使用正则表达式计算多个字符串

python regex

Python 在多个文件中使用正则表达式计算多个字符串,python,regex,count,Python,Regex,Count,我试图计算一个“类型”在文本文件中出现多少次，并且需要包含以下单词。例如，“A型”或“苹果型”在多个文件中显示多少次。我有这么多，但不是数一数，而是一个一个。我想最好把它存到字典里，这样我就可以打字然后数数了 current output file 1.txt {type A : 1} file 1.txt {type A : 1} file 2.txt {type apples : 1} file 2.txt {type apples : 1} 然而，这是我想要的。我是python的初学者

我试图计算一个“类型”在文本文件中出现多少次，并且需要包含以下单词。例如，“A型”或“苹果型”在多个文件中显示多少次。我有这么多，但不是数一数，而是一个一个。我想最好把它存到字典里，这样我就可以打字然后数数了

current output

file 1.txt {type A : 1}
file 1.txt {type A : 1}
file 2.txt {type apples : 1}
file 2.txt {type apples : 1}

然而，这是我想要的。我是python的初学者，所以我觉得我错过了一些显而易见的东西

expected output

file 1.txt {type A : 2}
file 2.txt {type apples : 2}

这是我目前掌握的代码

def find_files(d):
   for root, dirs, files in os.walk(d):
       for filename in files:
           if filename.endswith('.txt'):
               yield os.path.join(root, filename)

for file_name in find_files(d):
    with open(file_name, 'r') as f: 
        for line in f:
             results = defaultdict(int)
             line = line.lower().strip()
             match = re.search('type (\S+)', line)
             if match:
                results[match.group(0)] += 1
                print(file_name, results)

有几个错误：

您正在为每一行创建一个新词典；最好为每个文件创建一个
```
re.search
```
将查找字符串中的第一个匹配项；您可以使用
```
re.findall
```
查找所有匹配项

以下是您的代码的修订版本：

for file_name in find_files(d):
    with open(file_name, 'r') as f:
        results = defaultdict(int)
        for line in f:
             line = line.lower().strip()
             matches = re.findall('type (\S+)', line)
             if matches:
                for word in matches:
                    results[word] += 1
        print(file_name, results)

创建一个字典，该字典具有匹配的文本键，这些文本键映射到一个整数值，该整数值表示它被看到的次数。通过使用

collections.defaultdict（int）

子类

dict

可以在一定程度上简化此操作。