Python测试字符串是否与模板值匹配_Python_Regex_String

Python测试字符串是否与模板值匹配

python regex string

Python测试字符串是否与模板值匹配,python,regex,string,Python,Regex,String,我试图遍历字符串列表，只保留那些与我指定的命名模板匹配的字符串。我希望接受与模板完全匹配的任何列表条目，而不是在变量字段中使用整数支票必须是普通支票。具体来说，字符串结构可能会发生变化，因此不能保证始终显示在字符X处（例如，使用列表理解）下面的代码显示了一种使用split的方法，但是必须有更好的方法来进行字符串比较。我可以在这里使用正则表达式吗 template = 'name_is_here_<SCENARIO>_20131204.txt' testList = ['name

我试图遍历字符串列表，只保留那些与我指定的命名模板匹配的字符串。我希望接受与模板完全匹配的任何列表条目，而不是在变量

字段中使用整数

支票必须是普通支票。具体来说，字符串结构可能会发生变化，因此不能保证

始终显示在字符X处（例如，使用列表理解）

下面的代码显示了一种使用

split

的方法，但是必须有更好的方法来进行字符串比较。我可以在这里使用正则表达式吗

template = 'name_is_here_<SCENARIO>_20131204.txt'

testList = ['name_is_here_100_20131204.txt',        # should accept
            'name_is_here_100_20131204.txt.NEW',    # should reject
            'other_name.txt']                       # should reject

acceptList = []

for name in testList:
    print name
    acceptFlag = True
    splitTemplate = template.split('_')
    splitName = name.split('_')
    # if lengths do not match, name cannot possibly match template
    if len(splitTemplate) == len(splitName):
        print zip(splitTemplate, splitName)
        # compare records in the split
        for t, n in zip(splitTemplate, splitName):
            if t!=n and not t=='<SCENARIO>':
                #reject if any of the "other" fields are not identical
                #(would also check that '<SCENARIO>' field is numeric - not shown here)
                print 'reject: ' + name
                acceptFlag = False
    else:
        acceptFlag = False

    # keep name if it passed checks
    if acceptFlag == True:
        acceptList.append(name)

print acceptList
# correctly prints --> ['name_is_here_100_20131204.txt']

template='name\u是这里的\u\u 20131204.txt'
testList=['name_is_here_100_20131204.txt'，#应接受
“name_is_here_100_20131204.txt.NEW”应拒绝
'other_name.txt']#应拒绝
acceptList=[]
对于testList中的名称：
印刷品名称
acceptFlag=True
splitTemplate=模板。拆分（“”）
拆分名称=名称。拆分（“”）
#若长度不匹配，则名称可能无法与模板匹配
如果len（splitTemplate）==len（splitName）：
打印zip（拆分模板，拆分名称）
#比较拆分中的记录
对于zip中的t，n（拆分模板，拆分名称）：
如果没有=n而不是t=''：
#如果任何“其他”字段不相同，则拒绝
#（还将检查“”字段是否为数字-此处未显示）
打印“拒绝：”+名称
acceptFlag=False
其他：
acceptFlag=False
#如果通过检查，则保留名称
如果acceptFlag==True：
acceptList.append（名称）
打印接受列表
#正确打印-->['name_is_here_100_20131204.txt']

尝试使用Python中正则表达式的

re

模块：

import re

template = re.compile(r'^name_is_here_(\d+)_20131204.txt$')

testList = ['name_is_here_100_20131204.txt', #accepted
            'name_is_here_100_20131204.txt.NEW', #rejected!
            'name_is_here_aabs2352_20131204.txt', #rejected!
            'other_name.txt'] #rejected!

acceptList = [item for item in testList if template.match(item)]

尝试使用Python中正则表达式的

re

模块：

import re

template = re.compile(r'^name_is_here_(\d+)_20131204.txt$')

testList = ['name_is_here_100_20131204.txt', #accepted
            'name_is_here_100_20131204.txt.NEW', #rejected!
            'name_is_here_aabs2352_20131204.txt', #rejected!
            'other_name.txt'] #rejected!

acceptList = [item for item in testList if template.match(item)]

这应该可以，我知道name_就是这里的字母数字字符的占位符

import re
testList = ['name_is_here_100_20131204.txt',        # should accept
            'name_is_here_100_20131204.txt.NEW',    # should reject
            'other_name.txt', 
            'name_is_44ere_100_20131204.txt',
            'name_is_here_100_2013120499.txt', 
            'name_is_here_100_something_2013120499.txt',
            'name_is_here_100_something_20131204.txt']  


def find(scenario):
    begin  = '[a-z_]+100_' # any combinations of chars and underscores followd by 100
    end = '_[0-9]{8}.txt$' #exactly eight digits followed by .txt at the end
    pattern = re.compile("".join([begin,scenario,end]))
    result = []
    for word in testList:
        if pattern.match(word):
            result.append(word)

    return result

find('something') # returns ['name_is_here_100_something_20131204.txt']

编辑：在单独的变量中，正则表达式现在只匹配字符，后跟100，然后是scenarion，然后是8位数字，后跟.txt

这应该可以，我知道name_is_这里只是字母数字字符的占位符

import re
testList = ['name_is_here_100_20131204.txt',        # should accept
            'name_is_here_100_20131204.txt.NEW',    # should reject
            'other_name.txt', 
            'name_is_44ere_100_20131204.txt',
            'name_is_here_100_2013120499.txt', 
            'name_is_here_100_something_2013120499.txt',
            'name_is_here_100_something_20131204.txt']  


def find(scenario):
    begin  = '[a-z_]+100_' # any combinations of chars and underscores followd by 100
    end = '_[0-9]{8}.txt$' #exactly eight digits followed by .txt at the end
    pattern = re.compile("".join([begin,scenario,end]))
    result = []
    for word in testList:
        if pattern.match(word):
            result.append(word)

    return result

find('something') # returns ['name_is_here_100_something_20131204.txt']

编辑：在单独的变量中，正则表达式现在只匹配字符，后跟100，然后是scenarion，然后是8位数字，后跟.txt

是的，这里可以使用正则表达式。到目前为止你有正则表达式吗？@Simeonviser-对不起，还没有正则表达式。我知道存在regex，但我不熟悉实现细节。在走得太远之前，我想确保这是一个值得的方法。谢谢你的确认。是的，这里可以使用正则表达式。到目前为止你有正则表达式吗？@Simeonviser-对不起，还没有正则表达式。我知道存在regex，但我不熟悉实现细节。在走得太远之前，我想确保这是一个值得的方法。谢谢你的确认。这看起来正是我想要的。我想唯一的问题是我希望

模板

是可变的，而不需要特别输入regex细节。换句话说，我希望以我指定的格式输入模板，并让代码自动将其转换为regex语句。我确信有一种方法可以将我的模板解析为您指定的格式——我将在下一步对此进行研究。感谢您的指导。我使用了一个concationation来构建通用编译字符串：

feedNameRegex='^'+feedName.replace（''，r'（\d+））+'$'

。你觉得这种方法有什么问题吗？@Roberto嗯，我觉得不错，但你应该在你的环境中测试一下，抱歉耽搁了！这看起来正是我想要的。我想唯一的问题是我希望

模板

feedNameRegex='^'+feedName.replace（''，r'（\d+））+'$'

。你觉得这种方法有什么问题吗？@Roberto嗯，我觉得不错，但你应该在你的环境中测试一下，抱歉耽搁了！这可能太笼统了。我认为这并不能保证在所需变量部分之外的所有地方都有相同的命名。例如，

name_is_44; u 100_20131204.txt

将通过。好的，因此这里的name_is_应仅为下划线分隔的字母。您希望在变量中保留哪个部分？Name_是这里的_100是一个常量字符串吗？场景后的位数是固定的吗？如果需要，您可以从变量创建正则表达式。这可能太笼统了。我认为这并不能保证在所需变量部分之外的所有地方都有相同的命名。例如，

name_is_44; u 100_20131204.txt

将通过。好的，因此这里的name_is_应仅为下划线分隔的字母。您希望在变量中保留哪个部分？Name_是这里的_100是一个常量字符串吗？场景后的位数是固定的吗？若需要，可以从变量创建正则表达式。