在python 3的输入文件中找不到子字符串_Python_Python 2.7_Python 3.x_Python Unicode

在python 3的输入文件中找不到子字符串

python python-2.7 python-3.x

在python 3的输入文件中找不到子字符串,python,python-2.7,python-3.x,python-unicode,Python,Python 2.7,Python 3.x,Python Unicode,我正在尝试从输入文件中查找格式。但是，如果我使用“r”，有时会得到不匹配的结果，有时还会出现unicode错误 def extract_files(filename): file = open(filename, 'r') text = file.read() files_match = re.findall('<Compile Include="src\asf\preprocessor\string.h">', text) if not files_

我正在尝试从输入文件中查找格式。但是，如果我使用“r”，有时会得到不匹配的结果，有时还会出现unicode错误

def extract_files(filename):
    file = open(filename, 'r')
    text = file.read()
    files_match = re.findall('<Compile Include="src\asf\preprocessor\string.h">', text)
    if not files_match:
        sys.stderr.write('no match')
        sys.exit()
    for f in files_match:
        print(f)

def extract_文件（文件名）：
文件=打开（文件名“r”）
text=file.read（）
文件\u match=re.findall（“”，文本）
如果文件不匹配：
sys.stderr.write（'不匹配'）
sys.exit（）
对于文件中的f\u匹配：
印刷品（f）

看起来您正试图在

之后提取所有字符串。我们可以这样做，但请注意，这可能会打破边缘的情况

import re

def extract_files(filename):
    with open(filename,'r') as file:
        text = file.read
    matches = re.findall(r'(?<=<Compile Include=")[-.A-Za-z\\]+(?=")', text)
    # finds all pathnames that contain ONLY lowercase or uppercase letters,
    # a dash (-) or a dot (.), separated ONLY by a backslash (\)
    # terminates as soon as it finds a double-quote ("), NOT WHEN IT FINDS A
    # SINGLE QUOTE (')
    if not matches:
        sys.stderr.write("no match")
        sys.exit()
    for match in matches:
        print(match)

重新导入
def extract_文件（文件名）：
打开（文件名，'r'）作为文件：
text=file.read
matches=re.findall（r'）（？另外，为什么要使用re
？如果要搜索字符串文字，只需使用If literal in text
。您的错误是因为反斜杠“\”？您是否按此处所述对其进行了转义。要转义文字反斜杠，您必须使用“\\\\\\'1。注意反斜杠-使用r“\s”或“\\s”2.不使用正则表达式时，不要使用正则表达式搜索；只需在文本中使用''，我正在尝试查找输入文件中包含的所有文件，这就是我使用re的原因。上述格式是否正确？它给出了错误“坏字符范围”，我正在尝试仅获取文件名“src\asf\preprocessor\string.h”@user2661518天啊，当我复制这个的时候，我打字了。我没有用[.-A-Za-z\\]
do[-.A-Za-z\\]
。如果连字符不是第一个（或最后一个或转义），它会认为它是一个字符范围。我编辑了我的解决方案以反映更改。我理解的Thnx“？=”用于匹配“？”？什么是？（？@user2661518，所以它可以匹配任何一组字符A-Z
，A-Z
，-
，
，\
一次或多次（尽可能多地匹配）