Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup无法从wiki提取表_Python_Web Scraping_Beautifulsoup - Fatal编程技术网

Python BeautifulSoup无法从wiki提取表

Python BeautifulSoup无法从wiki提取表,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,检查桌子时,我得到 但是,我得到了一个空列表。有什么想法吗?在对列进行排序之前,在浏览网站时不会显示表类wikitable sortable jquery tablesorter。通过使用table类wikitable sortable,我能够准确地获取一个表 import requests from bs4 import BeautifulSoup res = requests.get("https://en.wikipedia.org/wiki/Comparison_of_Intel_p

检查桌子时,我得到


但是,我得到了一个空列表。有什么想法吗?

在对列进行排序之前,在浏览网站时不会显示表类
wikitable sortable jquery tablesorter
。通过使用table类
wikitable sortable
,我能够准确地获取一个表

import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/Comparison_of_Intel_processors")
soup = BeautifulSoup(res.content, "html.parser")
tables = soup.find_all("table", class_="wikitable sortable")
print(len(tables))
注:

  • 在您的示例中,我使用了
    class.=
    而不是字典,因为
  • 在名为
    html.parser
    BeautifulSoup
    类中定义了一个解析器,因此该代码可以在打印的警告建议的不同环境下工作

尝试以下方法。它将从该网站获取表格数据:

import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/Comparison_of_Intel_processors")                                                  
soup = BeautifulSoup(res.text, 'lxml') #if you find any problem with "lxml" then try using "html.parser" instead
table = soup.find("table",class_="wikitable")
for items in table.find_all("tr")[:-1]:
    data = [' '.join(item.text.split()) for item in items.find_all(['th','td'])]
    print(data)
部分输出:

['Processor', 'Series Nomenclature', 'Code Name', 'Production Date', 'Supported Features (Instruction Set)', 'Clock Rate', 'Socket', 'Fabrication', 'TDP', 'Number of Cores', 'Bus Speed', 'L1 Cache', 'L2 Cache', 'L3 Cache', 'Overclock Capable']
['4004', '', '', 'Nov. 15,1971', '', '740 kHz', 'DIP', '10-micron', '', '1 740 kHz', 'N/A', 'N/A', 'N/A']
['8008', 'N/A', 'N/A', 'April 1972', 'N/A', '200 kHz - 800 kHz', 'DIP', '10-micron', '', '1', '200 kHz', 'N/A', 'N/A', 'N/A', '']
['8080', 'N/A', 'N/A', 'April 1974', 'N/A', '2 MHz - 3.125 MHz', 'DIP', '6-micron', '', '1', '2 MHz', 'N/A', 'N/A', 'N/A', '']

谢谢,这有助于解决我遇到的一个问题。你有斯坦福链接的更新吗?目前发布的是404'd。
import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/Comparison_of_Intel_processors")                                                  
soup = BeautifulSoup(res.text, 'lxml') #if you find any problem with "lxml" then try using "html.parser" instead
table = soup.find("table",class_="wikitable")
for items in table.find_all("tr")[:-1]:
    data = [' '.join(item.text.split()) for item in items.find_all(['th','td'])]
    print(data)
['Processor', 'Series Nomenclature', 'Code Name', 'Production Date', 'Supported Features (Instruction Set)', 'Clock Rate', 'Socket', 'Fabrication', 'TDP', 'Number of Cores', 'Bus Speed', 'L1 Cache', 'L2 Cache', 'L3 Cache', 'Overclock Capable']
['4004', '', '', 'Nov. 15,1971', '', '740 kHz', 'DIP', '10-micron', '', '1 740 kHz', 'N/A', 'N/A', 'N/A']
['8008', 'N/A', 'N/A', 'April 1972', 'N/A', '200 kHz - 800 kHz', 'DIP', '10-micron', '', '1', '200 kHz', 'N/A', 'N/A', 'N/A', '']
['8080', 'N/A', 'N/A', 'April 1974', 'N/A', '2 MHz - 3.125 MHz', 'DIP', '6-micron', '', '1', '2 MHz', 'N/A', 'N/A', 'N/A', '']