Python BeautifulSoup按索引[0]和[2]对'td'标记进行排序，但不能[1]_Python_Python 2.7_Web Scraping_Beautifulsoup

Python BeautifulSoup按索引[0]和[2]对'td'标记进行排序，但不能[1]

python python-2.7 web-scraping

Python BeautifulSoup按索引[0]和[2]对'td'标记进行排序，但不能[1],python,python-2.7,web-scraping,beautifulsoup,Python,Python 2.7,Web Scraping,Beautifulsoup,所以我正在和beautifulsoup做一个项目…我在索引表collumns。我正试图结束第二次勾结；使用0（第一列）和2（第三列）进行索引是可行的，但是1在Indexer中给出了结果代码如下： from bs4 import BeautifulSoup import requests import sys r = requests.get("http://evamsharma.finosus.com/beatles/index.html") data = r.text soup

所以我正在和beautifulsoup做一个项目…我在索引表collumns。我正试图结束第二次勾结；使用0（第一列）和2（第三列）进行索引是可行的，但是1在Indexer中给出了结果

代码如下：

from bs4 import BeautifulSoup

import requests

import sys

r  = requests.get("http://evamsharma.finosus.com/beatles/index.html")

data = r.text

soup = BeautifulSoup(data)
counter = 0  
for table in soup.find_all('table'):
    for row in soup.find_all('tr'):
        '''
        try:
            td = row.find_all('td')[0]
        except IndexError:
            continue
        for link in td.find_all(["a","p"]):
            title = str(link.contents)
            title = list(title)
            i = 0
            while i <= 2:
                del title[0]
                i += 1
            i = 0
            while i <= 1:
                del title[-1]
                i += 1
            title = ''.join(title)
            print(title)
        '''
        try:
            tdyear = row.find_all('td')[1] #This is the faulty index
        except IndexError:
            print("whoops darn!")
            continue
        for link in tdyear.find_all(["a","p"]):
            year = str(link.contents)
            print(year)
            year = list(year)
            year = ''.join(year)
            print(year)

从bs4导入美化组
导入请求
导入系统
r=请求。获取（“http://evamsharma.finosus.com/beatles/index.html")
数据=r.text
汤=美汤（数据）
计数器=0
对于汤中的桌子。查找所有（'table'）：
对于汤中的行。查找所有（'tr'）：
'''
尝试：
td=行。查找所有（'td'）[0]
除索引器外：
持续
对于td.find_all（[“a”，“p”]）中的链接：
title=str（link.contents）
标题=列表（标题）
i=0
虽然我不确定这是否是您所需要的，但以下是您今年所需的代码：
from bs4 import BeautifulSoup
import requests
import sys

r  = requests.get("http://evamsharma.finosus.com/beatles/index.html")

data = r.text

soup = BeautifulSoup(data)
counter = 0
for table in soup.find_all('table'):
    for row in soup.find_all('tr'):
        # try:
        #     td = row.find_all('td')[0]
        # except IndexError:
        #     continue
        # for link in td.find_all(["a","p"]):
        #     title = str(link.contents)
        #     title = list(title)
        #     i = 0
        #     while i <= 2:
        #         del title[0]
        #         i += 1
        #     i = 0
        #     while i <= 1:
        #         del title[-1]
        #         i += 1
        #     title = ''.join(title)
        #     print(title)

        try:
            tdyear = row.find_all('td')[1] #This is the faulty index            
        except IndexError:
            print("whoops darn!")
            continue
        year = ''.join(tdyear.contents)
        print(year)

从bs4导入美化组
导入请求
导入系统
r=请求。获取（“http://evamsharma.finosus.com/beatles/index.html")
数据=r.text
汤=美汤（数据）
计数器=0
对于汤中的桌子。查找所有（'table'）：
对于汤中的行。查找所有（'tr'）：
#尝试：
#td=行。查找所有（'td'）[0]
#除索引器外：
#继续
#对于td.find_all（[“a”，“p”]）中的链接：
#title=str（link.contents）
#标题=列表（标题）
#i=0
#而我的
下面有很多
s。您的代码正在收集所有的
，然后对它们进行迭代，并从每个
中获取第二个
请注意，第一个
（标题行）没有
，而是有一些
，因此对于第一行，find_all（'td'）
将返回一个空列表。因此，当您试图在索引1处访问它时，您将获得索引器

您可以通过修改内部for
循环来跳过标题行
查找汤中的行。查找所有（'tr'）[1:：

编辑：
年份的
标记中没有
或
，因此tdyear.find_all（[“a”，“p”]
应该返回一个空列表。你可以尝试打印tdyear。text
这样它就工作了——它停止了索引器。但是现在没有打印。我还尝试用@ton1c的更简单版本替换获取年份的代码。该版本也没有打印任何内容。当我运行此操作时，没有打印任何内容。我不知道为什么……它是ems希望它可以工作。您使用的是哪个版本的BeautifulSoup和python？BeautifulSoup4和python 2.7I在Linux和Windows上都可以获得输出。请检查您是否正确复制了代码，或者发布了BeautifulSoup和python的完整版本。好吧，它工作了……现在我很困惑。我做了一个更改，我使用了其他答案——为什么这会把剧本搞砸呢？还有，非常感谢！