Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python requests.get()循环不返回任何内容_Python_Web Scraping_Beautifulsoup_Python Requests - Fatal编程技术网

Python requests.get()循环不返回任何内容

Python requests.get()循环不返回任何内容,python,web-scraping,beautifulsoup,python-requests,Python,Web Scraping,Beautifulsoup,Python Requests,当我试图刮这个网站的多个页面,我没有得到任何内容的回报。我通常检查以确保我创建的所有列表长度相等,但所有列表都返回为len=0 我用过类似的代码来抓取其他网站,那么为什么这个代码不能正常工作呢 我已经尝试过一些解决方案,但没有达到我的目的:requests.Session()解决方案,如,.jsonas中建议的 提前感谢您的帮助 我做了一些简化修改。需要改变的主要变化是: gana\u tbody=soup.find\u all('table',class='canResults') can2=

当我试图刮这个网站的多个页面,我没有得到任何内容的回报。我通常检查以确保我创建的所有
列表长度相等,但所有列表都返回为
len=0

我用过类似的代码来抓取其他网站,那么为什么这个代码不能正常工作呢

我已经尝试过一些解决方案,但没有达到我的目的:
requests.Session()
解决方案,如,
.json
as中建议的


提前感谢您的帮助

我做了一些简化修改。需要改变的主要变化是:

  • gana\u tbody=soup.find\u all('table',class='canResults')
  • can2=info\not info.get\u text()
  • 我只对第112页进行了测试;生命太短暂了

    import requests
    from bs4 import BeautifulSoup
    from random import randint
    from time import sleep
    
    can = []
    pty_n = []
    cv1 = []
    cvs1 = []
    vot1 = []
    
    START_PAGE = 112
    END_PAGE = 112
    
    for page in range(START_PAGE, END_PAGE + 1):
        page = requests.get("https://www.ghanaweb.com/GhanaHomePage/election2012/parliament.constituency.php?ID=112&res=pm")
        page.encoding = page.apparent_encoding
        if not page:
            pass
        else:
            soup = BeautifulSoup(page.text, 'html.parser')
            ghana_tbody = soup.find_all('table', class_='canResults')
            sleep(randint(2,10))
            for container in ghana_tbody:
    
                #### CANDIDATES ####
                candidate = container.find_all('div', class_='can par')
                for data in candidate:
                    cand = data.find('h4')
                    for info in cand:
                        can2 = info # not info.get_text()
                        can.append(can2)
    
                #### PARTY NAMES ####
                partyn = container.find_all('h5')
                for data in partyn:
                    partyn2 = data.get_text()
                    pty_n.append(partyn2)
    
    
                #### CANDIDATE VOTES ####
                votec = container.find_all('td', class_='votes')
                for data in votec:
                    votec2 = data.get_text()
                    cv1.append(votec2)
    
                #### CANDIDATE VOTE SHARE ####
                cansh = container.find_all('td', class_='percent')
                for data in cansh:
                    cansh2 = data.get_text()
                    cvs1.append(cansh2)
    
            #### TOTAL  VOTES ####`
            tfoot = soup.find_all('tr', class_='total')
            for footer in tfoot:
                fvote = footer.find_all('td', class_='votes')
                for data in fvote:
                    fvote2 = data.get_text()
                    fvoteindiv = [fvote2]
                    fvotelist = fvoteindiv * (len(pty_n) - len(vot1))
                    vot1.extend(fvotelist)
    
    print('can = ', can)
    print('pty_n = ', pty_n)
    print('cv1 = ', cv1)
    print('cvs1 = ', cvs1)
    print('vot1 = ', vot1)
    
    印刷品:

    can =  ['Kwadwo Baah Agyemang', 'Daniel Osei', 'Anyang - Kusi Samuel', 'Mary Awusi']
    pty_n =  ['NPP', 'NDC', 'IND', 'IND']
    cv1 =  ['14,966', '9,709', '8,648', '969', '34292']
    cvs1 =  ['43.64', '28.31', '25.22', '2.83', '\xa0']
    vot1 =  ['34292', '34292', '34292', '34292']
    

    确保首先将起始页和结束页分别更改为100和350。

    您需要首先修复缩进;这是无效的。调用
    睡眠
    有什么意义呢?@Booboo谢谢;在我的代码中是正确的,但在网站上不是。现在修好了。我调用
    sleep
    在代码运行每个循环之前短暂暂停,以避免覆盖正在从中删除的服务器/在过程中被阻止。例如,如果您查看
    https://www.ghanaweb.com/GhanaHomePage/election2012/parliament.constituency.php?ID=100&res=pm
    ,我没有看到任何数据。如果您在页面上执行
    查看源代码
    并根据
    搜索
    can(上的类名),则找不到它。@Booboo这是真的;但是对于像(aka
    https://www.ghanaweb.com/GhanaHomePage/election2012/parliament.constituency.php?ID=112&res=pm
    )有一个类名为
    的can per
    。没有从范围
    100350
    内的任何页面中刮取信息是否有原因?
    can =  ['Kwadwo Baah Agyemang', 'Daniel Osei', 'Anyang - Kusi Samuel', 'Mary Awusi']
    pty_n =  ['NPP', 'NDC', 'IND', 'IND']
    cv1 =  ['14,966', '9,709', '8,648', '969', '34292']
    cvs1 =  ['43.64', '28.31', '25.22', '2.83', '\xa0']
    vot1 =  ['34292', '34292', '34292', '34292']