Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
字符串python中多字标记的精确且不区分大小写的匹配_Python_String_String Matching - Fatal编程技术网

字符串python中多字标记的精确且不区分大小写的匹配

字符串python中多字标记的精确且不区分大小写的匹配,python,string,string-matching,Python,String,String Matching,我有一个列表,其中包含一个和多个单词的标记 brand_list = ['ibm','microsoft','abby softwate', 'tata computer services'] 我需要搜索标题字符串中的任何这些单词。我能找到一个单词。但是对于多字令牌,我的代码失败了。 这是我的密码。请帮帮我。这是我的解决办法 import string def check_firm(test_title): translator = str.maketrans('', '', stri

我有一个列表,其中包含一个和多个单词的标记

brand_list = ['ibm','microsoft','abby softwate', 'tata computer services']
我需要搜索标题字符串中的任何这些单词。我能找到一个单词。但是对于多字令牌,我的代码失败了。 这是我的密码。请帮帮我。这是我的解决办法

import string
def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)
    if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):

        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")

    print("current value of status code ------------>", status_code_value)

更改此设置

if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):
if title.lower() in brand_list:
import string
brand_list = ['ibm','Microsoft','abby softwate', 'TATA computer services']
brand_list = [x.lower() for x in brand_list] # ['ibm', 'microsoft', 'abby softwate', 
                                             #  'tata computer services']

def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)

    if title.lower() in brand_list:
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")

    print("current value of status code ------------>", status_code_value)

check_firm('iBM')
check_firm('Tata Computer SERVICES')
check_firm('Khan trading Co.')
OEM word found
current value of status code ------------> 0
OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1
对此

if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):
if title.lower() in brand_list:
import string
brand_list = ['ibm','Microsoft','abby softwate', 'TATA computer services']
brand_list = [x.lower() for x in brand_list] # ['ibm', 'microsoft', 'abby softwate', 
                                             #  'tata computer services']

def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)

    if title.lower() in brand_list:
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")

    print("current value of status code ------------>", status_code_value)

check_firm('iBM')
check_firm('Tata Computer SERVICES')
check_firm('Khan trading Co.')
OEM word found
current value of status code ------------> 0
OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1
因此

if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):
if title.lower() in brand_list:
import string
brand_list = ['ibm','Microsoft','abby softwate', 'TATA computer services']
brand_list = [x.lower() for x in brand_list] # ['ibm', 'microsoft', 'abby softwate', 
                                             #  'tata computer services']

def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)

    if title.lower() in brand_list:
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")

    print("current value of status code ------------>", status_code_value)

check_firm('iBM')
check_firm('Tata Computer SERVICES')
check_firm('Khan trading Co.')
OEM word found
current value of status code ------------> 0
OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1
输出

if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):
if title.lower() in brand_list:
import string
brand_list = ['ibm','Microsoft','abby softwate', 'TATA computer services']
brand_list = [x.lower() for x in brand_list] # ['ibm', 'microsoft', 'abby softwate', 
                                             #  'tata computer services']

def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)

    if title.lower() in brand_list:
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")

    print("current value of status code ------------>", status_code_value)

check_firm('iBM')
check_firm('Tata Computer SERVICES')
check_firm('Khan trading Co.')
OEM word found
current value of status code ------------> 0
OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1
注意:我使用以下方法将列表中的所有元素转换为
lower()

这将确保正确进行比较

编辑

OP:但我的输入磁贴是标题字符串。例如,“塔塔计算机服务公司盈利x美元”。在这种情况下,我们如何找到字符串

在这种情况下,我会选择在传递给函数之前拆分字符串:

inp_st1 = 'iBM'
inp_st2 = 'Tata Computer SERVICES made a profit of x dollars'
inp_st3 = 'Khan trading Co.'

check_firm(inp_st1)
check_firm(" ".join(inp_st2.split()[:3])) # Tata Computer SERVICES
check_firm(inp_st3)

由于此代码,您将永远无法找到两个单词:

title.lower().split(' ')
假设您的头衔是塔塔计算机服务公司,当您执行该代码时,您将使用:

["tata", "computer", "services"]
然后在你的for循环中你只会搜索每个单数单词,基本上你已经把标题分解成了你无法匹配的东西

用人类语言编写循环的

any(one_word.lower() in title.lower().split(' ') for one_word in brand_list)
如果brand_列表中的任何单词可以在数组[“tata”、“computer”、“services”]中找到,那么这就是事实

如您所见,品牌列表中的任何单词都无法匹配,因为该单词实际上由三个单词和空格“塔塔计算机服务”组成

要实现您的目标:

更改此项:

if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):
致:

这样,您就可以从标题内的品牌列表中查找每个单词。您的代码如下所示:

brand_list = ['ibm','microsoft','abby softwate', 'tata computer services']

 import string
def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)
    if any(one_word.lower() in title.lower() for one_word in brand_list):
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")

    print("current value of status code ------------>", status_code_value)

check_firm("ibm")
check_firm("abby software")
check_firm("abby softwate apple")  
具有以下输出:

OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1
OEM word found
current value of status code ------------> 0
编辑


OP:我尝试了你的解决方案。问题是它也适用于“塔塔计算机服务”等输入。任何克服这个问题的想法。谢谢

在评论中强调,该代码将传递一个标题,如tat computer servicesss。为了避免这种情况,我建议使用正则表达式,例如:

brand_list = ['ibm','microsoft','abby softwate', 'tata computer services']

import string
import re
def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)
    if any(re.search(r'\b' + one_word.lower() + r'\b', title) for one_word in brand_list):
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")

    print("current value of status code ------------>", status_code_value)

check_firm("tata computer services")  
check_firm("tata computer servicessssss")  
check_firm("tata computer services something else") 
输出

OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1
OEM word found
current value of status code ------------> 0
感兴趣的部分是:

any(re.search(r'\b' + one_word.lower() + r'\b', title) for one_word in brand_list):

title.lower()。它根据空格标记字符串。当我搜索多字母单词时,它失败了。这是测试字符串-“塔塔计算机服务,不太成功”看看下面的答案是否有用?谢谢你的回答,但我的输入是标题字符串。例如,“塔塔计算机服务公司盈利x美元”。在这种情况下,我们如何找到字符串?@RohitHaritash在传递给函数之前,应该拆分输入字符串。并且只传递所需的字符串。我正要对此进行评论,似乎应该在答案中说明,以免混淆。我尝试了您的解决方案。问题是它也适用于“塔塔计算机服务”等输入。任何克服这个问题的想法。Thanks@RohitHaritash此答案中的代码克服了该解决方案,如果它有效,请确保进行注释或投票,以便其他人也可以快速找到答案。