String 将Beautifulsoup刮表转换为列表_String_Python 3.x_Beautifulsoup

String 将Beautifulsoup刮表转换为列表

string python-3.x

String 将Beautifulsoup刮表转换为列表,string,python-3.x,beautifulsoup,String,Python 3.x,Beautifulsoup,使用Beautifulsoup从Wikipedia中删除一列将返回最后一行，而我希望所有列都列在一个列表中： from urllib.request import urlopen from bs4 import BeautifulSoup site = "https://en.wikipedia.org/wiki/Agriculture_in_India" html = urlopen(site) soup = BeautifulSoup(html, "html.parser") table

使用Beautifulsoup从Wikipedia中删除一列将返回最后一行，而我希望所有列都列在一个列表中：

from urllib.request import urlopen
from bs4 import BeautifulSoup

site = "https://en.wikipedia.org/wiki/Agriculture_in_India"
html = urlopen(site)
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table", {'class': 'wikitable sortable'})

for row in table.find_all("tr")[1:]:
    col = row.find_all("td")
    if len(col) > 0:
            com = str(col[1].string.strip("\n"))

        list(com)
com

Out: 'ZTS'

所以它只显示字符串的最后一行，我希望得到一个列表，其中字符串的每一行都是字符串值。这样我就可以将列表分配给新变量

"Rice", "Buffalo milk", "Cow milk", "Wheat"

有人能帮我吗？

您的方法将不起作用，因为您没有向com“添加”任何内容

实现您愿望的一种方法是：

from urllib.request import urlopen
from bs4 import BeautifulSoup
site = "https://en.wikipedia.org/wiki/Agriculture_in_India"
html = urlopen(site)
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table", {'class': 'wikitable sortable'})
com=[]
for row in table.find_all("tr")[1:]:
    col = row.find_all("td")
    if len(col)> 0:
        temp=col[1].contents[0]
        try:
            to_append=temp.contents[0]
        except Exception as e:
            to_append=temp
        com.append(to_append)

print(com)

这会给你你所需要的

解释

col[1]。contents[0]

给出标记的第一个子项

.contents

提供标记的子项列表。这里我们有一个孩子，所以

在某些情况下，

标记内的内容是一个

详细信息
感谢您的见解。这正是我想要的。