Python 如何浏览多个公司的wiki表
我正试图浏览三星、阿里巴巴等多家公司的维基表格,但无法做到。下面是我的代码Python 如何浏览多个公司的wiki表,python,web-scraping,Python,Web Scraping,我正试图浏览三星、阿里巴巴等多家公司的维基表格,但无法做到。下面是我的代码 import csv from urllib.request import urlopen from bs4 import BeautifulSoup csvFile = open('Information.csv', 'wt+') writer = csv.writer(csvFile) lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','W
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
csvFile = open('Information.csv', 'wt+')
writer = csv.writer(csvFile)
lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','Wipro','IBM','Alibaba_Group','Baidu','Yahoo!','Oracle_Corporation']
for a in lst:
html = urlopen("https://en.wikipedia.org/wiki/a")
bs = BeautifulSoup(html, 'html.parser')
table = bs.findAll('table')
for tr in table:
rows = tr.findAll('tr')
for row in rows:
csvRow = []
for cell in row.findAll(['td', 'th']):
csvRow.append(cell.get_text())
print(csvRow)
writer.writerow(csvRow)
您将
a
作为字符串本身传递,而不是对列表中某个项目的引用。以下是更正后的代码:
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
csvFile = open('Information.csv', 'wt+')
writer = csv.writer(csvFile)
lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','Wipro','IBM','Alibaba_Group','Baidu','Yahoo!','Oracle_Corporation']
for a in lst:
html = urlopen("https://en.wikipedia.org/wiki/{}".format(a))
bs = BeautifulSoup(html, 'html.parser')
table = bs.findAll('table')
for tr in table:
rows = tr.findAll('tr')
for row in rows:
csvRow = []
for cell in row.findAll(['td', 'th']):
csvRow.append(cell.get_text())
print(csvRow)
writer.writerow(csvRow)
html=urlopen(“https://en.wikipedia.org/wiki/a“
就是问题所在
您正在循环通过lst
获取每个公司的url,但在urlopen
方法中使用字符串文本无法获取
解决这个问题的方法是替换html=urlopen(“https://en.wikipedia.org/wiki/a”
,具有以下任一项:
html=urlopen(“https://en.wikipedia.org/wiki/“+a)
html=urlopen(f)https://en.wikipedia.org/wiki/{a} )需要python 3.6+
html=urlopen(“https://en.wikipedia.org/wiki/{}.格式(a))