Python 按任意顺序将数组元素与字符串匹配_Python

Python 按任意顺序将数组元素与字符串匹配

python

Python 按任意顺序将数组元素与字符串匹配,python,Python,我对python非常陌生，并试图找出tweet是否有任何查找元素例如，如果我能找到“猫”这个词，它应该和“猫”匹配。可爱的小猫可以任意顺序匹配。但据我所知，我无法找到解决办法。感谢您的指导 import re lookup_table = ['cats', 'cute kittens', 'dog litter park'] tweets = ['that is a cute cat', 'kittens are cute', 'that is a cu

我对python非常陌生，并试图找出tweet是否有任何查找元素

例如，如果我能找到“猫”这个词，它应该和“猫”匹配。可爱的小猫可以任意顺序匹配。但据我所知，我无法找到解决办法。感谢您的指导

import re
lookup_table = ['cats', 'cute kittens', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']
for tweet in tweets:
    lookup_found = None
    print re.findall(r"(?=(" + '|'.join(lookup_table) + r"))", tweet.lower())

输出

['cat']
[]
[]
['dog litter park']
[]

预期产出：

that is a cute cat > cats
kittens are cute > cute kittens
this is a cute kitten > cute kittens
that is a dog litter park > dog litter park
no wonder that dog park is bad > dog litter park

对于仅为一个单词文本的查找单词，可以使用

for word in tweet

对于像“可爱的小猫”这样的查找词，您可以在这里查看任何顺序。只需拆分单词并在tweet字符串中查找即可

这就是我所尝试的，它不是有效的，而是有效的。试着运行它

lookup_table = ['cat', 'cute kitten', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']

for word in lookup_table:
    for tweet in tweets:
        if " " in word:
            temp = word.split(sep=" ")
        else:
            temp = [word]
        for x in temp:
            if x in tweet:
                print(tweet)
                break

我会这样做的。我认为查找表不必太严格，我们可以避免复数

import re
lookup_table = ['cat', 'cute kitten', 'dog litter park']
tweets = ['that is a cute cat',
      'kittens are cute',
      'that is a cute kitten',
      'that is a dog litter park',
      'no wonder that dog park is bad']
for data in lookup_table:
    words=data.split(" ")
    for word in words:
        result=re.findall(r'[\w\s]*' + word + '[\w\s]*',','.join(tweets))
        if len(result)>0:
            print(result)

问题1：

单数/复数：为了让事情顺利进行，我会使用一个python包来消除单数和复数等等

问题2：

拆分和连接：我写了一个小脚本来演示如何使用它，虽然没有经过严格的测试，但应该会让你有所行动

import inflect 
p = inflect.engine()
lookup_table = ['cats', 'cute kittens', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']

for tweet in tweets:
    matched = []
    for lt in lookup_table:
            match_result = [lt for mt in lt.split() for word in tweet.split() if p.compare(word, mt)]
            if any(match_result):
                matched.append(" ".join(match_result))
    print tweet, '>>' , matched

?? 使用单数形式。你还应该告诉我们你真正想要的输出。@KarolyHorvath我不知道你说的是什么意思that@PM2Ring当然，刚刚添加了预期的输出。为什么最后一条推文匹配？它不包含“垃圾”。如果原因是查找字符串中的所有单词都不需要匹配，那么为什么第一条tweet不同时匹配“cats”和“可爱的小猫”？