属性错误：'；非类型'；对象没有属性'；集团'；Can'；t解析（Python）_Python_Regex_Python 2.7

属性错误：'；非类型'；对象没有属性'；集团'；Can'；t解析（Python）

python regex python-2.7

属性错误：'；非类型'；对象没有属性'；集团'；Can'；t解析（Python）,python,regex,python-2.7,Python,Regex,Python 2.7,当我试图从self.web_url解析“bloomberg”时，出现以下错误。self.web_url的类型是unicode，所以我认为这可能就是原因。然而，我不知道如果必要的话如何实现类型转换或者怎么做 self.web_url = "http://www.bloomberg.com" start = "http:/www." end = ".com") print type(self.web_url) web_name = re.search('%s(.*)%s' %

当我试图从self.web_url解析“bloomberg”时，出现以下错误。self.web_url的类型是unicode，所以我认为这可能就是原因。然而，我不知道如果必要的话如何实现类型转换或者怎么做

self.web_url = "http://www.bloomberg.com"
start = "http:/www."
    end = ".com")
    print type(self.web_url)
    web_name = re.search('%s(.*)%s' % (start, end), self.web_url).group(1)

您在

start

中缺少一个

：

start = 'http://www.'

还要注意，

在正则表达式中有特殊的含义，它是一个正则表达式标记，将匹配任何单个字符，而不是文本

。您需要将其转义为文字，即

\.

所以你最好：

start = "http://www\."
end = "\.com"

由于不存在匹配项，因此会出现错误。您的模式不正确，因为它匹配一个

，而

http:

之后有两个

s。您需要按照heemayl的建议修复模式，或者使用基于

urlparse

的替代解决方案来获取

netloc

部分，并将部分放在第一个点和最后一个点之间（使用

find

和

rfind

或regex）：

看

正则表达式1-

\A[^.]*.（....\.[^.]*\Z

-将匹配字符串的开头（

\A

），然后是0+非

s（

[^.]*

），然后是一个点（

\.

），然后将新行以外的任何0+字符捕获到组1中，然后将匹配

和0+非

，直到字符串的最末端（

\Z

）

正则表达式2将只匹配第一个

后面的任何0+字符，直到最后一个

将

之间的内容捕获到组1中。

那么一个点呢？@WiktorStribiżew哪个？

import urlparse, re
path = urlparse.urlparse("http://www.bloomberg.com")
print(path.netloc[path.netloc.find(".")+1:path.netloc.rfind(".")]) # => bloomberg
# or a regex:
print(re.sub(r"\A[^.]*\.(.*)\.[^.]*\Z", r"\1", path.netloc)) # => bloomberg
# or Regex 2:
mObj = re.search(r"\.(.*)\.", path.netloc);
if mObj:
    print(mObj.group(1)) # => bloomberg