Python 为什么是价值;外部“U链接”;而不是从网站上刮下来的东西?
下面是我的代码,但是为什么Python 为什么是价值;外部“U链接”;而不是从网站上刮下来的东西?,python,web-scraping,beautifulsoup,urllib,Python,Web Scraping,Beautifulsoup,Urllib,下面是我的代码,但是为什么品牌价值输出外部链接而不是我提取的项目列表 from bs4 import BeautifulSoup as soup from urllib.request import urlopen as uReq my_url = 'https://en.wikipedia.org/wiki/Harry_Potter' uClient = uReq(my_url) page_html = uClient.read() uClient.close() page_soup =
品牌
价值输出外部链接
而不是我提取的项目列表
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
my_url = 'https://en.wikipedia.org/wiki/Harry_Potter'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html,"html.parser")
headline = page_soup.findAll("span",{"class":"mw-headline"})
for item in headline:
brand = item["id"] # Outputs "External_links"
在
for
循环中,您正在迭代页面中的每个标题,然后将标题值分配给变量brand
。循环完成后,brand
的值将成为最后一个标题(“外部链接”)
如果您修改代码以打印出每个标题的值,您将看到您正在获取所需的值
>>> for item in headline:
... print(item["id"])
...
Plot
Early_years
Voldemort_returns
Supplementary_works
Harry_Potter_and_the_Cursed_Child
In-universe_books
Pottermore_website
Structure_and_genre
Themes
Origins
Publishing_history
Translations
Completion_of_the_series
Cover_art
Achievements
Cultural_impact
Commercial_success
Awards,_honours,_and_recognition
Reception
Literary_criticism
Social_impact
Controversies
Adaptations
Films
Spin-off_prequels
Games
Audiobooks
Stage_production
Attractions
The_Wizarding_World_of_Harry_Potter
The_Making_of_Harry_Potter
References
Further_reading
External_links
您的
品牌
变量需要是一个列表,例如,代码可能如下所示:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
from pprint import pprint
my_url = 'https://en.wikipedia.org/wiki/Harry_Potter'
with uReq(my_url) as uClient:
page_html = uClient.read()
page_soup = soup(page_html, "xml")
brand = []
for item in page_soup.find_all('span', {'class': 'mw-headline'}):
brand.append(item["id"])
pprint(brand)
印刷品:
['Plot',
'Early_years',
'Voldemort_returns',
'Supplementary_works',
'Harry_Potter_and_the_Cursed_Child',
'In-universe_books',
'Pottermore_website',
'Structure_and_genre',
'Themes',
'Origins',
'Publishing_history',
'Translations',
'Completion_of_the_series',
'Cover_art',
'Achievements',
'Cultural_impact',
'Commercial_success',
'Awards,_honours,_and_recognition',
'Reception',
'Literary_criticism',
'Social_impact',
'Controversies',
'Adaptations',
'Films',
'Spin-off_prequels',
'Games',
'Audiobooks',
'Stage_production',
'Attractions',
'The_Wizarding_World_of_Harry_Potter',
'The_Making_of_Harry_Potter',
'References',
'Further_reading',
'External_links']
使用列表理解实现相同的目标:
import requests
from bs4 import BeautifulSoup
from pprint import pprint
url = 'https://en.wikipedia.org/wiki/Harry_Potter'
soup = BeautifulSoup(requests.get(url).text, "lxml")
items = [item.get('id') for item in soup.find_all('span',class_='mw-headline')]
pprint(items)
但我需要将其分配给一个变量,以便导出到csv,这可能吗?您可以创建一个列表,然后将每个标题添加到列表中,然后将列表的内容写入文件。