Python 在web抓取中不迭代列表_Python_Python 3.x_Web Scraping_Beautifulsoup

Python 在web抓取中不迭代列表

python python-3.x web-scraping

Python 在web抓取中不迭代列表,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,通过链接，我试图创建两个列表：一个是国家列表，另一个是货币列表。然而，我被困在某个点上，它只给了我第一个国家的名字，但没有迭代到所有国家的列表中。任何关于我如何解决这个问题的帮助都将不胜感激。提前谢谢以下是我的尝试： from bs4 import BeautifulSoup import urllib.request url = "http://www.worldatlas.com/aatlas/infopage/currency.htm" headers = {'User-Agent':

通过链接，我试图创建两个列表：一个是国家列表，另一个是货币列表。然而，我被困在某个点上，它只给了我第一个国家的名字，但没有迭代到所有国家的列表中。任何关于我如何解决这个问题的帮助都将不胜感激。提前谢谢

以下是我的尝试：

from bs4 import BeautifulSoup
import urllib.request

url = "http://www.worldatlas.com/aatlas/infopage/currency.htm"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 
10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 
Safari/537.36'}

req = urllib.request.Request(url, headers=headers)
resp = urllib.request.urlopen(req)
html = resp.read()

soup = BeautifulSoup(html, "html.parser")
attr = {"class" : "miscTxt"}

countries = soup.find_all("div", attrs=attr)
countries_list = [tr.td.string for tr in countries]

for country in countries_list:
    print(country)

试试这个脚本。它应该给你国家名称和相应的货币。您不需要为此网站使用标题

from bs4 import BeautifulSoup
import urllib.request

url = "http://www.worldatlas.com/aatlas/infopage/currency.htm"
resp = urllib.request.urlopen(urllib.request.Request(url)).read()
soup = BeautifulSoup(resp, "lxml")

for item in soup.select("table tr"):
    try:
        country = item.select("td")[0].text.strip()
    except IndexError:
        country = ""
    try:
        currency = item.select("td")[0].find_next_sibling().text.strip()
    except IndexError:
        currency = ""
    print(country,currency)

部分输出：

Afghanistan afghani
Algeria dinar
Andorra euro
Argentina peso
Australia dollar

您还可以使用单个理解列表创建元组列表，如

[（国家、货币）]

&然后将元组转换为两个列表，其中包括：

完整代码：

from bs4 import BeautifulSoup
import urllib.request

req = urllib.request.Request("http://www.worldatlas.com/aatlas/infopage/currency.htm")

soup = BeautifulSoup(urllib.request.urlopen(req).read(), "html.parser")

countries = soup.find_all("div", attrs = {"class" : "miscTxt"})

temp_list = [
    (t[0].text.strip(), t[1].text.strip()) 
    for t in (t.find_all('td') for t in countries[0].find_all('tr'))
    if t
]

countries_list, currency_list = map(list,zip(*temp_list))

print(countries_list)
print(currency_list)

您是否打印了

国家列表

以检查它是否包含多个条目？是的，我打印了。它只打印列表中的第一个国家我刚刚检查了您的

国家列表

，它只包含

阿富汗

。这不是迭代，问题是

[tr.td.string for tr in countries]

from bs4 import BeautifulSoup
import urllib.request

req = urllib.request.Request("http://www.worldatlas.com/aatlas/infopage/currency.htm")

soup = BeautifulSoup(urllib.request.urlopen(req).read(), "html.parser")

countries = soup.find_all("div", attrs = {"class" : "miscTxt"})

temp_list = [
    (t[0].text.strip(), t[1].text.strip()) 
    for t in (t.find_all('td') for t in countries[0].find_all('tr'))
    if t
]

countries_list, currency_list = map(list,zip(*temp_list))

print(countries_list)
print(currency_list)