Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex AttributeError:基于重新编译模式_Regex_Python 3.x - Fatal编程技术网

Regex AttributeError:基于重新编译模式

Regex AttributeError:基于重新编译模式,regex,python-3.x,Regex,Python 3.x,我是Python的新手,我很确定这仅仅是一个金发女郎的时刻,但这几天我一直被它逼疯了 我一直在第106行看到“elif conName.search(line):AttributeError:'list'对象没有属性'search'” 如果我将第54行中的模式替换为第50行中的模式,那么第106-113行运行良好,但在第114行得到相同的错误 ## This should be line 19 html_doc = """ <title>Flickr: userna

我是Python的新手,我很确定这仅仅是一个金发女郎的时刻,但这几天我一直被它逼疯了

我一直在第106行看到“elif conName.search(line):AttributeError:'list'对象没有属性'search'”

如果我将第54行中的模式替换为第50行中的模式,那么第106-113行运行良好,但在第114行得到相同的错误

##  This should be line 19
html_doc = """
        <title>Flickr: username</title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <meta property="og:title" content="username" />
        <meta property="og:type" content="flickr_photos:profile" />
        <meta property="og:url" content="http://www.flickr.com/people/Something/" />
        <meta property="og:site_name" content="Flickr" />
        <meta property="og:image" content="http://farm79.staticflickr.com/1111/buddyicons/99999999@N99.jpg?1234567890#99999999@N99" />

                <li>
                <a href="/groups/theportraitgroup/">The Portrait Group</a>
                <span class="text-list-item">
        1,939,830 photos,&nbsp;125,874 members    
                </span>
        </li>
                <li>
                <a href="/groups/412762@N20/">Seagulls Gone Wild</a>
                        <span class="text-list-item">
                2,266 photos,&nbsp;464 members
                </span>
                                                        </li> """

from urllib.request import urlopen
from bs4 import BeautifulSoup
import fileinput
import re
                                ##  This should be line 46
## Strips for basic group data
Tab      = re.compile("(\t){1,}")                                                    # strip tabs
ID       = re.compile("^.*/\">")                                    # Group ID, could be ID or Href
Href     = re.compile("(\s)*<a href=\"/groups/") # Strips to beginning of ID
GName    = re.compile("/\">(<b>)*")                                    # Strips from end of Href to GName

## Persons contact info
conName  = re.compile("(\s)*<meta property=\"og\:title\" content=\"")        # Contact Name 
##conName = re.compile("(\s)*<a href=\"/groups/")
conID    = re.compile("(\s)*<meta property=\"og\:image.*\#")                    # Gets conName's @N ID
conRef   = re.compile("(\s)*<meta property=\"og\:url.*com/people/")

Amp   = re.compile("&amp;")
Qt    = re.compile("&quot;")                                                
Gt    = re.compile("&gt;")
Lt    = re.compile("&lt;")


exfile = 1        ## 0 = use internal data, 1 = use external file

InFile = html_doc

if exfile:
        InFile = open('\Python\test\Group\group 50 min.ttxt', 'r', encoding = "utf-8", errors = "backslashreplace")
        closein = 1        ## Only close input file if it was opened
else:
        closein = 0

OutFile = open('C:\Python\test\Group\Output.ttxt', 'w', encoding = "utf-8", errors = "backslashreplace")
cOutFile = open('C:\Python\test\Group\ContactOutput.ttxt', 'w', encoding = "utf-8", errors = "backslashreplace")

i = 1    ## counter for debugging

                                ##  This should be line 80
for line in InFile:
##    print('{}'.format(i), end = ', ') ## this is just a debugging line, to see where the program errors out
##    i += 1
        if Href.search(line):
                ln = line
                ln = re.sub(Href, "", ln)
                gID, Name = ln.split("/\">")
                Name = Name[:-5]    ## this removes the "\n" at EOL as well
                if "@N" in gID:
                        rH = ""
                else:        
                        rH = gID
                        gID = ""

##        sLn = '{3}\t{0}\t{1}\t{2}\n'.format(Name, gID, rH, conName)
                sLn = '{0}\t{1}\t{2}\n'.format(Name, gID, rH, conName)
                        ##  Replace HTML codes
                sLn = re.sub(Gt, ">", sLn)
                sLn = re.sub(Lt, "<", sLn)
                sLn = re.sub(Qt, "\"", sLn)
                sLn = re.sub(Amp, "&", sLn)

                OutFile.write(sLn)
                        ##  This should be line 104
                #################################################    
        elif conName.search(line):
                ln = line
                ln = re.sub(conName, "", ln)
                conName = ln.split("\" />")
        elif conID.search(line) is not None:
                ln = line
                ln = re.sub(conID, "", ln)
                conID = ln.split("\" />")
        elif conRef.search(line) is not None:
                ln = line
                ln = re.sub(conRef, "", ln)
                conRef = ln.split("\" />")
        else:
                pass

        sLn = '{0}\t{1}\t{2}\n'.format(conID, conRef, conName)
        cOutFile.write(sLn)        ## I know, this will make a massive file with duplicated data, but deal w/ it later
                #################################################

if closein:
        InFile.close()
OutFile.close()
cOutFile.close()
当我注释掉第105-123行时,代码的其余部分工作得最好

当我注释掉第106-109行时,第110-113行运行良好,但在第114行得到相同的错误

##  This should be line 19
html_doc = """
        <title>Flickr: username</title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <meta property="og:title" content="username" />
        <meta property="og:type" content="flickr_photos:profile" />
        <meta property="og:url" content="http://www.flickr.com/people/Something/" />
        <meta property="og:site_name" content="Flickr" />
        <meta property="og:image" content="http://farm79.staticflickr.com/1111/buddyicons/99999999@N99.jpg?1234567890#99999999@N99" />

                <li>
                <a href="/groups/theportraitgroup/">The Portrait Group</a>
                <span class="text-list-item">
        1,939,830 photos,&nbsp;125,874 members    
                </span>
        </li>
                <li>
                <a href="/groups/412762@N20/">Seagulls Gone Wild</a>
                        <span class="text-list-item">
                2,266 photos,&nbsp;464 members
                </span>
                                                        </li> """

from urllib.request import urlopen
from bs4 import BeautifulSoup
import fileinput
import re
                                ##  This should be line 46
## Strips for basic group data
Tab      = re.compile("(\t){1,}")                                                    # strip tabs
ID       = re.compile("^.*/\">")                                    # Group ID, could be ID or Href
Href     = re.compile("(\s)*<a href=\"/groups/") # Strips to beginning of ID
GName    = re.compile("/\">(<b>)*")                                    # Strips from end of Href to GName

## Persons contact info
conName  = re.compile("(\s)*<meta property=\"og\:title\" content=\"")        # Contact Name 
##conName = re.compile("(\s)*<a href=\"/groups/")
conID    = re.compile("(\s)*<meta property=\"og\:image.*\#")                    # Gets conName's @N ID
conRef   = re.compile("(\s)*<meta property=\"og\:url.*com/people/")

Amp   = re.compile("&amp;")
Qt    = re.compile("&quot;")                                                
Gt    = re.compile("&gt;")
Lt    = re.compile("&lt;")


exfile = 1        ## 0 = use internal data, 1 = use external file

InFile = html_doc

if exfile:
        InFile = open('\Python\test\Group\group 50 min.ttxt', 'r', encoding = "utf-8", errors = "backslashreplace")
        closein = 1        ## Only close input file if it was opened
else:
        closein = 0

OutFile = open('C:\Python\test\Group\Output.ttxt', 'w', encoding = "utf-8", errors = "backslashreplace")
cOutFile = open('C:\Python\test\Group\ContactOutput.ttxt', 'w', encoding = "utf-8", errors = "backslashreplace")

i = 1    ## counter for debugging

                                ##  This should be line 80
for line in InFile:
##    print('{}'.format(i), end = ', ') ## this is just a debugging line, to see where the program errors out
##    i += 1
        if Href.search(line):
                ln = line
                ln = re.sub(Href, "", ln)
                gID, Name = ln.split("/\">")
                Name = Name[:-5]    ## this removes the "\n" at EOL as well
                if "@N" in gID:
                        rH = ""
                else:        
                        rH = gID
                        gID = ""

##        sLn = '{3}\t{0}\t{1}\t{2}\n'.format(Name, gID, rH, conName)
                sLn = '{0}\t{1}\t{2}\n'.format(Name, gID, rH, conName)
                        ##  Replace HTML codes
                sLn = re.sub(Gt, ">", sLn)
                sLn = re.sub(Lt, "<", sLn)
                sLn = re.sub(Qt, "\"", sLn)
                sLn = re.sub(Amp, "&", sLn)

                OutFile.write(sLn)
                        ##  This should be line 104
                #################################################    
        elif conName.search(line):
                ln = line
                ln = re.sub(conName, "", ln)
                conName = ln.split("\" />")
        elif conID.search(line) is not None:
                ln = line
                ln = re.sub(conID, "", ln)
                conID = ln.split("\" />")
        elif conRef.search(line) is not None:
                ln = line
                ln = re.sub(conRef, "", ln)
                conRef = ln.split("\" />")
        else:
                pass

        sLn = '{0}\t{1}\t{2}\n'.format(conID, conRef, conName)
        cOutFile.write(sLn)        ## I know, this will make a massive file with duplicated data, but deal w/ it later
                #################################################

if closein:
        InFile.close()
OutFile.close()
cOutFile.close()
##这应该是第19行
html_doc=“”
Flickr:用户名
  • 1939830张照片,125874名成员
  • 2266张照片,464名成员
  • “”“ 从urllib.request导入urlopen 从bs4导入BeautifulSoup 导入文件输入 进口稀土 ##这应该是第46行 ##用于基本组数据的条带 Tab=re.compile(“(\t){1,}”)#条带制表符 ID=re.compile(“^.*/\”>”)组ID,可以是ID或Href Href=re.compile(“(\s)*()*”)#从Href末尾到GName的条带 ##个人联系信息 conName=re.compile(“(\s)* 我一直在第106行看到“elif conName.search(line):AttributeError:'list'对象没有属性'search'”

    这意味着此时
    conName
    是一个列表。实际上,下面两行将
    conName
    分配给列表:

    conName = ln.split("\" />")
    
    在第一次
    conName.search()
    返回一个成功匹配后,您将其更改为一个列表,因此对于文件中的以下行(下一次循环迭代)
    conName.search
    会产生错误

    我一直在第106行看到“elif conName.search(line):AttributeError:'list'对象没有属性'search'”

    这意味着此时
    conName
    是一个列表。实际上,下面两行将
    conName
    分配给列表:

    conName = ln.split("\" />")
    

    在第一次
    conName.search()
    返回一个成功匹配后,您将其更改为一个列表,以便用于文件中的以下行(下一次循环迭代)
    conName.search
    产生错误。

    我看到您有来自bs4的
    导入BeautifulSoup
    为什么不使用它来解析html片段而不是正则表达式?我看到您有来自bs4导入BeautifulSoup的
    为什么不使用它来解析html片段而不是正则表达式?谢谢!这非常有效,并且我玩过一点BS,但我既不懂Python也不懂BS,我觉得最好一次只玩一个!;)谢谢!这非常有效,并且节省了大量的头发拉扯!我玩过一点BS,但我既不懂Python也不懂BS,我认为最好一次只玩一个