如何设置；“任何字符串”；作为Python中的正则表达式_Python_Regex_Python 2.7

如何设置；“任何字符串”；作为Python中的正则表达式

python regex python-2.7

如何设置；“任何字符串”；作为Python中的正则表达式,python,regex,python-2.7,Python,Regex,Python 2.7,我有这种情况 if(url.startswith("http://country.domain.com/motors/used-cars/") and (url!="http://country.domain.com/motors/used-cars/")): if url.startswith("http://country.domain.com/motors/used-cars/?page="): return None else: retur

我有这种情况

if(url.startswith("http://country.domain.com/motors/used-cars/") and (url!="http://country.domain.com/motors/used-cars/")):
    if url.startswith("http://country.domain.com/motors/used-cars/?page="):
        return None
    else:
        return url

它正在工作，但由于某些原因，该公司将url从：

http://country.domain

到

因为有很多城市。我有很多类似的URL：

http://city1.domain
http://city2.domain
http://city3.domain
http://city4.domain
http://city20.domain

回到我的状态，我必须将其更改为添加

20个城市
我的问题
有没有办法让我做到这一点：
http://whateverthenamehere.doman
我想python上的正则表达式就是我需要的，但我不知道正确的代码是什么
我尝试使用\s、\s*和\s+，但没有任何效果
您能否帮助使用正则表达式测试URL是否以您的路径和变量第一个域部分以及更多文本开头：
[a-zA-Z0-9-]
字符组匹配所有有效的域名字符<代码>\w

是不够的，因为它允许使用下划线（

）、而不是破折号（-
）
URL的其余部分在组1中捕获，因此您可以进一步检查它
演示：
>>重新导入
>>>#URL中的文本不够：
... 
>>>检索（r'^http://[a-zA-Z0-9-]+.domain.com/motors/used cars/（.+），
...           'http://city42.domain.com/motors/used-cars/）没有
真的
>>>#捕获URL的其余部分以供检查：
...
>>>检索（r'^http://[a-zA-Z0-9-]+.domain.com/motors/used cars/（.+），
'http://city42.domain.com/motors/used-cars/?page=')
>>>检索（r'^http://[a-zA-Z0-9-]+.domain.com/motors/used cars/（.+），
'http://city42.domain.com/motors/used-cars/?page=小组(一)
“？page=”
>>>#评论中提到的特定URL：
...
>>>检索（r'^http://[a-zA-Z0-9-]+.domain.com/motors/used cars/（.+），
'http://testes.domain.com/motors/used-cars/jeep/wrangler/2014/6/5/jeep-wrangler-‌2/？back=dwfllmr1yml6emxllmnvbs9tb3rvcnmvdxnlzc1jyxjz9wywdlptm%3D&pos=8'）。组（1）
'jeep/wrangler/2014/6/5/jeep wrangler-\xe2\x80\x8c\xe2\x80\x8b2/？back=dwfllmr1yml6emxllmnvbs9tb3rvcnmvdxnlzc1jyxjz9wydlptm%3D&pos=8'
假设城市名称按字母顺序排列，http://[a-zA-Z]+\.域名
可能会起作用。@thg435不，这不起作用。这种情况总是会出错，我喜欢这些“它不起作用”的评论。当然。你是说进口吗？@FarhadAliNoo:嗯？什么？无辜的哨子。但我在第一个中有两个条件，如果你有one@MarcoDinatsoli例如我把它们合起来了。这就是正则表达式以+
结尾的原因。仍然不起作用。例如http://testes.domain.com/motors/used-cars/jeep/wrangler/2014/6/5/jeep-wrangler-2/?back=dWFlLmR1Yml6emxlLmNvbS9tb3RvcnMvdXNlZC1jYXJzLz9wYWdlPTM%3D&pos=8
仍转到false
，但应转到返回ULR
http://city1.domain
http://city2.domain
http://city3.domain
http://city4.domain
http://city20.domain

import re

match = re.search(r'^http://[a-zA-Z0-9-]+.domain.com/motors/used-cars/(.+)', url)
if match:
    if match.group(1).startswith('?page='):
        return None
    return url

>>> import re
>>> # Not enough text in the URL:
... 
>>> re.search(r'^http://[a-zA-Z0-9-]+.domain.com/motors/used-cars/(.+)', 
...           'http://city42.domain.com/motors/used-cars/') is None
True
>>> # Remainder of the URL is captured for inspection:
...
>>> re.search(r'^http://[a-zA-Z0-9-]+.domain.com/motors/used-cars/(.+)',
              'http://city42.domain.com/motors/used-cars/?page=')
<_sre.SRE_Match object at 0x100621558>
>>> re.search(r'^http://[a-zA-Z0-9-]+.domain.com/motors/used-cars/(.+)',
              'http://city42.domain.com/motors/used-cars/?page=').group(1)
'?page='
>>> # specific URL mentioned in the comments:
...
>>> re.search(r'^http://[a-zA-Z0-9-]+.domain.com/motors/used-cars/(.+)',
              'http://testes.domain.com/motors/used-cars/jeep/wrangler/2014/6/5/jeep-wrangler-‌2/?back=dWFlLmR1Yml6emxlLmNvbS9tb3RvcnMvdXNlZC1jYXJzLz9wYWdlPTM%3D&pos=8').group(1)
'jeep/wrangler/2014/6/5/jeep-wrangler-\xe2\x80\x8c\xe2\x80\x8b2/?back=dWFlLmR1Yml6emxlLmNvbS9tb3RvcnMvdXNlZC1jYXJzLz9wYWdlPTM%3D&pos=8'