Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 仅打印不匹配的正则表达式_Regex_Python 3.x - Fatal编程技术网

Regex 仅打印不匹配的正则表达式

Regex 仅打印不匹配的正则表达式,regex,python-3.x,Regex,Python 3.x,我正在尝试删除与要从新闻文章列表中排除的源列表匹配的特定URL。我只想打印与正则表达式不匹配的URL。我还只想打印一次列表新闻文章中的项目 如何仅打印不匹配的URL 如何只打印一次不匹配的URL import re sources_to_exclude = ['cnn.com','france24.com','reuters.com'] news_articles = ['http://www.chicagotribune.com/news/nationworld/ct-south-afr

我正在尝试删除与要从新闻文章列表中排除的源列表匹配的特定URL。我只想打印与正则表达式不匹配的URL。我还只想打印一次列表新闻文章中的项目

  • 如何仅打印不匹配的URL

  • 如何只打印一次不匹配的URL

    import re
    
    sources_to_exclude = ['cnn.com','france24.com','reuters.com']
    
    news_articles = ['http://www.chicagotribune.com/news/nationworld/ct-south-africa-trump-tweet-20180823-story.html',
             'https://www.theatlantic.com/international/archive/2018/08/trump-rule-of-law-south-africa-farmers/568390',
             'https://www.aljazeera.com/news/2018/08/south-africa-calls-trump-misinformed-land-policy-180823060142595.html',
             'https://www.timeslive.co.za/politics/2018-08-23-trumps-administration-to-monitor-land-expropriation-in-south-africa',
             'https://mg.co.za/article/2018-08-23-south-african-politicians-resist-trumps-falsehoods-about-south-africa',
             'https://www.cnn.com/2018/08/22/africa/south-africa-racist-rant-video/index.html',
             'https://www.reuters.com/article/us-safrica-usa-presidency/south-africa-to-seek-clarification-from-us-embassy-on-trumps-land-reform-tweet-sabc-idUSKCN1L80JI',
             'https://www.thedailybeast.com/trump-bemoans-persecuted-white-farmers-in-south-africa',
             'https://www.france24.com/en/20180823-south-africa-recall-mostert-second-argentina-test']
    
    for result in news_articles:
      for link in sources_to_exclude:
        regex = '((http[s]?|ftp):\/)?\/?([^:\/\s]+)?({})\/([^\/]+)'.format(link)
        match = re.search(r'{}'.format(regex), result, re.IGNORECASE)
        if match:
          print ('Matched regex:  {}'.format(result))
        else:
          # I only want to print items that DID NOT match the regex pattern
          # I also want to print these items once.
          print('Did not matched regex:  {}'.format(result))
    

您的代码几乎完成了。这应该起作用:

for result in news_articles:
    for link in sources_to_exclude:
        regex = '((http[s]?|ftp):\/)?\/?([^:\/\s]+)?({})\/([^\/]+)'.format(link)
        match = re.search(r'{}'.format(regex), result, re.IGNORECASE)
        if match is not None:
            break
    else:
        print('Did not match any regex: {}'.format(result))

Python支持
上的
else
for
循环。如果循环正常存在(未使用
break
停止),则执行else块。循环中断时,如果有任何正则表达式匹配,则只有在没有正则表达式匹配时才会执行(并打印链接)。

您的代码几乎完成了。这应该起作用:

for result in news_articles:
    for link in sources_to_exclude:
        regex = '((http[s]?|ftp):\/)?\/?([^:\/\s]+)?({})\/([^\/]+)'.format(link)
        match = re.search(r'{}'.format(regex), result, re.IGNORECASE)
        if match is not None:
            break
    else:
        print('Did not match any regex: {}'.format(result))

Python支持
上的
else
for
循环。如果循环正常存在(未使用
break
停止),则执行else块。循环中断时,如果有任何正则表达式匹配,则只有在没有正则表达式匹配时才会执行(并打印链接)。

您可以使用列表理解:

[i for i in news_articles if not re.search('|'.join(sources_to_exclude),i)]

Out[610]: 
['http://www.chicagotribune.com/news/nationworld/ct-south-africa-trump-tweet-20180823-story.html',
 'https://www.theatlantic.com/international/archive/2018/08/trump-rule-of-law-south-africa-farmers/568390',
 'https://www.aljazeera.com/news/2018/08/south-africa-calls-trump-misinformed-land-policy-180823060142595.html',
 'https://www.timeslive.co.za/politics/2018-08-23-trumps-administration-to-monitor-land-expropriation-in-south-africa',
 'https://mg.co.za/article/2018-08-23-south-african-politicians-resist-trumps-falsehoods-about-south-africa',
 'https://www.thedailybeast.com/trump-bemoans-persecuted-white-farmers-in-south-africa']
您还可以执行以下操作:

re.sub('^.*('+'|'.join(sources_to_exclude)+').*$', "", "\n".join(news_articles),flags=re.M).split()
Out[612]: 
['http://www.chicagotribune.com/news/nationworld/ct-south-africa-trump-tweet-20180823-story.html',
 'https://www.theatlantic.com/international/archive/2018/08/trump-rule-of-law-south-africa-farmers/568390',
 'https://www.aljazeera.com/news/2018/08/south-africa-calls-trump-misinformed-land-policy-180823060142595.html',
 'https://www.timeslive.co.za/politics/2018-08-23-trumps-administration-to-monitor-land-expropriation-in-south-africa',
 'https://mg.co.za/article/2018-08-23-south-african-politicians-resist-trumps-falsehoods-about-south-africa',
 'https://www.thedailybeast.com/trump-bemoans-persecuted-white-farmers-in-south-africa']

您可以使用列表理解:

[i for i in news_articles if not re.search('|'.join(sources_to_exclude),i)]

Out[610]: 
['http://www.chicagotribune.com/news/nationworld/ct-south-africa-trump-tweet-20180823-story.html',
 'https://www.theatlantic.com/international/archive/2018/08/trump-rule-of-law-south-africa-farmers/568390',
 'https://www.aljazeera.com/news/2018/08/south-africa-calls-trump-misinformed-land-policy-180823060142595.html',
 'https://www.timeslive.co.za/politics/2018-08-23-trumps-administration-to-monitor-land-expropriation-in-south-africa',
 'https://mg.co.za/article/2018-08-23-south-african-politicians-resist-trumps-falsehoods-about-south-africa',
 'https://www.thedailybeast.com/trump-bemoans-persecuted-white-farmers-in-south-africa']
您还可以执行以下操作:

re.sub('^.*('+'|'.join(sources_to_exclude)+').*$', "", "\n".join(news_articles),flags=re.M).split()
Out[612]: 
['http://www.chicagotribune.com/news/nationworld/ct-south-africa-trump-tweet-20180823-story.html',
 'https://www.theatlantic.com/international/archive/2018/08/trump-rule-of-law-south-africa-farmers/568390',
 'https://www.aljazeera.com/news/2018/08/south-africa-calls-trump-misinformed-land-policy-180823060142595.html',
 'https://www.timeslive.co.za/politics/2018-08-23-trumps-administration-to-monitor-land-expropriation-in-south-africa',
 'https://mg.co.za/article/2018-08-23-south-african-politicians-resist-trumps-falsehoods-about-south-africa',
 'https://www.thedailybeast.com/trump-bemoans-persecuted-white-farmers-in-south-africa']