Python：从许多文件中提取特定数据（模式）并写入单个文件（循环文件和提取模式）我是Python新手，我正在尝试从许多文件中提取特定的数据（模式），并将其（每行带有分隔模式）写入单个文件。下面是带说明的代码。我错过了什么？提前谢谢。导入操作系统目_Python_Text Extraction

Python：从许多文件中提取特定数据（模式）并写入单个文件（循环文件和提取模式）我是Python新手，我正在尝试从许多文件中提取特定的数据（模式），并将其（每行带有分隔模式）写入单个文件。下面是带说明的代码。我错过了什么？提前谢谢。导入操作系统目

python

Python：从许多文件中提取特定数据（模式）并写入单个文件（循环文件和提取模式）我是Python新手，我正在尝试从许多文件中提取特定的数据（模式），并将其（每行带有分隔模式）写入单个文件。下面是带说明的代码。我错过了什么？提前谢谢。导入操作系统目,python,text-extraction,Python,Text Extraction,Python：从许多文件中提取特定数据（模式）并写入单个文件（循环文件和提取模式）我是Python新手，我正在尝试从许多文件中提取特定的数据（模式），并将其（每行带有分隔模式）写入单个文件。下面是带说明的代码。我错过了什么？提前谢谢。导入操作系统目录='C://tmp/' 进口稀土 lookUpdatea=[] linenum=0 pattern1=re.compile（“date”，re.IGNORECASE）#编译一个不区分大小写的正则表达式 pattern2=re.compile（

Python：从许多文件中提取特定数据（模式）并写入单个文件（循环文件和提取模式）我是Python新手，我正在尝试从许多文件中提取特定的数据（模式），并将其（每行带有分隔模式）写入单个文件。下面是带说明的代码。我错过了什么？提前谢谢。

导入操作系统
目录='C://tmp/'
进口稀土
lookUpdatea=[]
linenum=0
pattern1=re.compile（“date”，re.IGNORECASE）#编译一个不区分大小写的正则表达式
pattern2=re.compile（“time”，re.IGNORECASE）#编译一个不区分大小写的正则表达式
pattern3=re.compile（“nm”，re.IGNORECASE）#编译一个不区分大小写的正则表达式
pattern4=re.compile（“Miles”，re.IGNORECASE）#编译一个不区分大小写的正则表达式
pattern5=re.compile（“hour”，re.IGNORECASE）#编译一个不区分大小写的正则表达式
#写入文件
outp=open（“C://tmp/outp.txt”，“a”）#，换行符='\r\n'）
#循环并提取模式
对于os.listdir（目录）中的文件名：
如果filename.endswith（“.msg”）或filename.endswith（“.txt”）：
myfile=str（os.path.join（目录，文件名））
#myfile=''。加入（myfile）
#继续
打印（myfile）
###如果文件特别是：C://temp/t2.txt，则下面的部分可以工作，但如果文件是循环的，则不能工作
###以open（'C://tmp/t2.txt'，'rt'）作为myfile：
使用open（f'{myfile}'，rt'，encoding='cp1252'）：#openfiles
###差异存在
###>>>打印（myfile）返回
###C://tmp/t2.txt****VS****
对于myfile中的行：
linenum+=1
如果模式1.搜索（线）或模式2.搜索（线）或模式3.搜索（线）或模式4.搜索（线）或模式5.搜索（线）！=无：#如果找到匹配项
sline=line.split（“，”）#将行分隔为项目列表。“，”告诉它在逗号处拆分行
#sline.append（（sline））
#sline.append（linenum）
打印（sline）#每行现在都是一个列表
输出写入（sline）```

似乎“.read（）：”在“with open（f'{myfile}，'rt'，encoding='cp1252'）”中丢失了，因此它变成了“with open（f'{myfile}，'rt'，encoding='cp1252'）。read（）：”但是由于*.msg是电子邮件，我在文件“C:\Users\..\AppData\Local Programs\Python\Python39\lib\encodings\cp1252.py”第23行中有错误：文件“”，在解码返回编解码器中.charmap_decode（输入、自身错误、解码表）[0]UnicodeDecodeError:“charmap”编解码器无法解码756位置的字节0x8d：字符映射到

import os
directory = 'C://tmp/'
import re
LookupData = []
linenum = 0
pattern1 = re.compile("date", re.IGNORECASE)  # Compile a case-insensitive regex
pattern2 = re.compile("time", re.IGNORECASE)  # Compile a case-insensitive regex
pattern3 = re.compile(" nm", re.IGNORECASE)  # Compile a case-insensitive regex
pattern4 = re.compile("Miles", re.IGNORECASE)  # Compile a case-insensitive regex
pattern5 = re.compile("hour", re.IGNORECASE)  # Compile a case-insensitive regex

#  write to file
outF = open("C://tmp/outF.txt", "a") #, newline='\r\n')

# loop and extract patterns
for filename in os.listdir(directory):
    if filename.endswith(".msg") or filename.endswith(".txt"):
        myfile = str(os.path.join(directory, filename))
        #myfile = '''.join(myfile)
        #continue
        print(myfile)
### this part below is working if file is specifically: C://temp/t2.txt but not if it is looping
###     with open ('C://tmp/t2.txt', 'rt') as myfile:    

        with open(f'{myfile}','rt', encoding = 'cp1252'):    #  open files

### difference being 
### >>> print(myfile) returns
### C://tmp/t2.txt      ***** VS ******     <_io.TextIOWrapper name='C://tmp/t2.txt' mode='rt' encoding='cp1252'>
            for line in myfile:
                linenum += 1
                if pattern1.search(line) or pattern2.search(line) or pattern3.search(line) or pattern4.search(line) or pattern5.search(line) != None:      # If a match is found 
                sline = line.split(',')  # separates line into a list of items.  ',' tells it to split the lines at the commas
                #          sline.append((sline))
                #          sline.append(linenum)
                print(sline) #each line is now a list
                outF.write(sline)```