Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/277.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何浏览多个公司的wiki表_Python_Web Scraping - Fatal编程技术网

Python 如何浏览多个公司的wiki表

Python 如何浏览多个公司的wiki表,python,web-scraping,Python,Web Scraping,我正试图浏览三星、阿里巴巴等多家公司的维基表格,但无法做到。下面是我的代码 import csv from urllib.request import urlopen from bs4 import BeautifulSoup csvFile = open('Information.csv', 'wt+') writer = csv.writer(csvFile) lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','W

我正试图浏览三星、阿里巴巴等多家公司的维基表格,但无法做到。下面是我的代码

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

csvFile = open('Information.csv', 'wt+')
writer = csv.writer(csvFile)
lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','Wipro','IBM','Alibaba_Group','Baidu','Yahoo!','Oracle_Corporation']
for a in lst:
    html = urlopen("https://en.wikipedia.org/wiki/a")
    bs = BeautifulSoup(html, 'html.parser')
    table = bs.findAll('table')
    for tr in table:
        rows = tr.findAll('tr')
        for row in rows:
            csvRow = [] 
            for cell in row.findAll(['td', 'th']):
                csvRow.append(cell.get_text())

         print(csvRow)
         writer.writerow(csvRow)

您将
a
作为字符串本身传递,而不是对列表中某个项目的引用。以下是更正后的代码:

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup

csvFile = open('Information.csv', 'wt+')
writer = csv.writer(csvFile)
lst=['Samsung','Facebook','Google','Tata_Consultancy_Services','Wipro','IBM','Alibaba_Group','Baidu','Yahoo!','Oracle_Corporation']
for a in lst:
    html = urlopen("https://en.wikipedia.org/wiki/{}".format(a))
    bs = BeautifulSoup(html, 'html.parser')
    table = bs.findAll('table')
    for tr in table:
        rows = tr.findAll('tr')
        for row in rows:
            csvRow = [] 
            for cell in row.findAll(['td', 'th']):
                csvRow.append(cell.get_text())

         print(csvRow)
         writer.writerow(csvRow)
html=urlopen(“https://en.wikipedia.org/wiki/a“
就是问题所在

您正在循环通过
lst
获取每个公司的url,但在
urlopen
方法中使用字符串文本无法获取

解决这个问题的方法是替换
html=urlopen(“https://en.wikipedia.org/wiki/a”
,具有以下任一项:

  • html=urlopen(“https://en.wikipedia.org/wiki/“+a)
  • html=urlopen(f)https://en.wikipedia.org/wiki/{a} )需要python 3.6+
  • html=urlopen(“https://en.wikipedia.org/wiki/{}.格式(a))

您的具体问题是什么?只是一个小提示。也许你应该使用维基百科API,而不是仅仅访问他们的网站。