Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/objective-c/23.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
尝试在python中向列表中添加项_Python - Fatal编程技术网

尝试在python中向列表中添加项

尝试在python中向列表中添加项,python,Python,我正在尝试使用Beautifulsoup从一个网站收集链接 from bs4 import BeautifulSoup import requests address="http://transcripts.cnn.com/TRANSCRIPTS/2018.04.29.html" page = requests.get(address) soup = BeautifulSoup(page.content, 'html.parser') articles =[] for links in so

我正在尝试使用Beautifulsoup从一个网站收集链接

from bs4 import BeautifulSoup
import requests

address="http://transcripts.cnn.com/TRANSCRIPTS/2018.04.29.html"
page = requests.get(address)
soup = BeautifulSoup(page.content, 'html.parser')

articles =[]
for links in soup.find_all('div', {'class':'cnnSectBulletItems'}):
    for link in soup.find_all('a'):
        article = link.get('href')
        articles.append(article)
        print(article)

有两个问题:

  • 存在重复的链接
  • print命令指示代码找到了链接,但列表项目不包含任何元素
  • 有人知道发生了什么吗?

    试试:

    from bs4 import BeautifulSoup
    soup = BeautifulSoup(s, 'html.parser')
    articles =[]
    for links in soup.find_all('div', {'class':'cnnSectBulletItems'}):
        for link in links.find_all('a'):    #-->Fetch Values from links instead of soup
            print link.get('href')
            articles.append(link.get('href'))
    print(articles)
    
    输出:

    /TRANSCRIPTS/1804/29/cnr.21.html
    /TRANSCRIPTS/1804/29/cnr.22.html
    /TRANSCRIPTS/1804/29/cnr.03.html
    /TRANSCRIPTS/1804/29/rs.01.html
    /TRANSCRIPTS/1804/29/ndaysun.02.html
    /TRANSCRIPTS/1804/29/sotu.01.html
    [u'/TRANSCRIPTS/1804/29/cnr.21.html', u'/TRANSCRIPTS/1804/29/cnr.22.html', u'/TRANSCRIPTS/1804/29/cnr.03.html', u'/TRANSCRIPTS/1804/29/rs.01.html', u'/TRANSCRIPTS/1804/29/ndaysun.02.html', u'/TRANSCRIPTS/1804/29/sotu.01.html']
    
    您可以使用(无重复元素的无序集合)删除重复链接

    for links in soup.find_all('div', {'class':'cnnSectBulletItems'}):
        links = set(links.find_all('a'))
        for link in links:
            print(link.get('href')) 
    

    请回答您的问题,并将输出作为文本,而不是作为图像的链接。我明白了-我应该正确使用内部循环。但是为什么变量“articles”是空的?它仍然是空的吗?是的,它打印得很好,没有重复项,但是列表仍然是空的。是的,新代码可以工作!谢谢:)。我猜变量的范围不会转移到shell中,因为在shell中键入打印(文章)时列表为空?要向上投票,您需要单击
    -1
    上方的向上箭头