Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/cmake/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 包含在别处的隔离字符串_Python - Fatal编程技术网

Python 包含在别处的隔离字符串

Python 包含在别处的隔离字符串,python,Python,我正在设置一个脚本,根据PDF文件名中包含的文本合并PDF文件。我这里的问题是,“小提琴I”也包含在“小提琴II”中,“中音萨克斯管I”也包含在“中音萨克斯管II”中。我如何设置它,使圣堂武士只包含“小提琴I”的条目,而不包含“小提琴II”,反之亦然 pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin II.pdf", ] instruments = ["Soprano", "Tenor"

我正在设置一个脚本,根据PDF文件名中包含的文本合并PDF文件。我这里的问题是,“小提琴I”也包含在“小提琴II”中,“中音萨克斯管I”也包含在“中音萨克斯管II”中。我如何设置它,使圣堂武士只包含“小提琴I”的条目,而不包含“小提琴II”,反之亦然

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin II.pdf",  ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Baritone Saxophone"]


# create arrays for each instrument that can be used for merging/organization
def organizer():
    for fileName in pdfList:
        for instrument in instruments:
            tempList = []
            if instrument in fileName:
                tempList.append(fileName)
        print tempList


print pdfList
organizer()

尝试进行以下更改:

...
if instrument+'.pdf' in fileName:
...

这是否涵盖所有情况?

尝试进行以下更改:

...
if instrument+'.pdf' in fileName:
...

这会涵盖所有情况吗?

避免包含子字符串的一种方法是使用正则表达式,如:

import re

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin \
II.pdf",  ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "\
Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Barit\
one Saxophone"]

# create arrays for each instrument that can be used for merging/organization   
def organizer():
    for fileName in pdfList:
        tempList = []
        for instrument in instruments:
            if re.search(r'\b{}\b'.format(instrument), fileName):
                tempList.append(fileName)
        print tempList

print pdfList
organizer()

这将使用
\b
包装您的搜索词,以便它仅在开头和结尾在单词边界上时匹配。另外,可能很明显但值得指出的是,这也将使您的工具名称成为正则表达式的一部分,所以请注意,如果您使用的任何字符也是正则表达式元字符,它们将被解释为正则表达式(现在您不是)。一个更通用的方案需要一些代码来查找并正确转义这些字符。

避免包含子字符串的一种方法是使用正则表达式,如:

import re

pdfList = ["01 Violin I.pdf", "02 Violin I.pdf","01 Violin II.pdf", "02 Violin \
II.pdf",  ]
instruments = ["Soprano", "Tenor", "Violin I", "Violin II", "Viola", "Cello", "\
Contrabass", "Alto Saxophone I", "Alto Saxophone II", "Tenor Saxophone", "Barit\
one Saxophone"]

# create arrays for each instrument that can be used for merging/organization   
def organizer():
    for fileName in pdfList:
        tempList = []
        for instrument in instruments:
            if re.search(r'\b{}\b'.format(instrument), fileName):
                tempList.append(fileName)
        print tempList

print pdfList
organizer()

这将使用
\b
包装您的搜索词,以便它仅在开头和结尾在单词边界上时匹配。另外,可能很明显但值得指出的是,这也将使您的工具名称成为正则表达式的一部分,所以请注意,如果您使用的任何字符也是正则表达式元字符,它们将被解释为正则表达式(现在您不是)。更一般的方案需要一些代码来查找并正确转义这些字符。

PDF是否总是这样命名?例如,
编号+仪器+.pdf
。或者我们应该假设PDF可以有任何包含仪器的名称吗?是的,PDF的格式将始终为“(初始编号)+(某些文本)+(仪器)+.pdfAre pdf文件的名称始终是这样的?即
Number+Instrument+.pdf
。或者我们应该假设pdf文件可以有包含仪器的任何名称吗?是的,pdf文件的格式始终是“(初始编号)+(一些文本)+(仪器)+.pdf