Python 如何使用soup&；从Wikipedia获取表中特定列下的内容；蟒蛇_Python_Html_Excel_Parsing_Beautifulsoup

Python 如何使用soup&；从Wikipedia获取表中特定列下的内容；蟒蛇

python html excel parsing

Python 如何使用soup&；从Wikipedia获取表中特定列下的内容；蟒蛇,python,html,excel,parsing,beautifulsoup,Python,Html,Excel,Parsing,Beautifulsoup,我需要从wikipedia的表中获取内容指向特定列下的href链接。页面为“”。在这个页面上有几个类为“wikitable”的表。我需要他们指向的每一行的标题下的内容链接。我想把它们复制到excel表格上我不知道在某个特定列下搜索的确切代码，但到目前为止，我得到了一个“非类型对象不可调用”。我正在使用bs4。我想至少提取表的某个部分，这样我就可以缩小到我想要的标题列下的href链接，但我最终遇到了这个错误。代码如下： from urllib.request import urlopen fro

我需要从wikipedia的表中获取内容指向特定列下的href链接。页面为“”。在这个页面上有几个类为“wikitable”的表。我需要他们指向的每一行的标题下的内容链接。我想把它们复制到excel表格上
我不知道在某个特定列下搜索的确切代码，但到目前为止，我得到了一个“非类型对象不可调用”。我正在使用bs4。我想至少提取表的某个部分，这样我就可以缩小到我想要的标题列下的href链接，但我最终遇到了这个错误。代码如下：

from urllib.request import urlopen from bs4 import BeautifulSoup soup = BeautifulSoup(urlopen('http://en.wikipedia.org/wiki/List_of_Telugu_films_of_2015').read()) for row in soup('table', {'class': 'wikitable'})[1].tbody('tr'): tds = row('td') print (tds[0].string, tds[0].string)

import urllib2 from bs4 import BeautifulSoup, SoupStrainer content = urllib2.urlopen("http://en.wikipedia.org/wiki/List_of_Telugu_films_of_2015").read() filter_tag = SoupStrainer("table", {"class":"wikitable"}) soup = BeautifulSoup(content, parse_only=filter_tag) links=[] for sp in soup.find_all(align="center"): a_tag = sp('a') if a_tag: links.append(a_tag[0].get('href'))

我很感激你的指导。有人知道吗？
发现无类型错误可能与表筛选有关。更正代码如下：

from urllib.request import urlopen from bs4 import BeautifulSoup soup = BeautifulSoup(urlopen('http://en.wikipedia.org/wiki/List_of_Telugu_films_of_2015').read()) for row in soup('table', {'class': 'wikitable'})[1].tbody('tr'): tds = row('td') print (tds[0].string, tds[0].string)

import urllib2 from bs4 import BeautifulSoup, SoupStrainer content = urllib2.urlopen("http://en.wikipedia.org/wiki/List_of_Telugu_films_of_2015").read() filter_tag = SoupStrainer("table", {"class":"wikitable"}) soup = BeautifulSoup(content, parse_only=filter_tag) links=[] for sp in soup.find_all(align="center"): a_tag = sp('a') if a_tag: links.append(a_tag[0].get('href'))