Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/318.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用BeautifulSoup仅读取外部表的行_Python_Beautifulsoup - Fatal编程技术网

Python 使用BeautifulSoup仅读取外部表的行

Python 使用BeautifulSoup仅读取外部表的行,python,beautifulsoup,Python,Beautifulsoup,内部表的行也被读取 table_grid_1 = soup.find("table", {"id": "GridView1"}) rows = table_grid_1.find("tbody").find_all("tr") 如何仅读取外部表的行?您可以尝试: i.e `print "length " + str(len(rows))` prints 5. but I want to read tr of only outer table like size should be 3 结果:

内部表的行也被读取

table_grid_1 = soup.find("table", {"id": "GridView1"})
rows = table_grid_1.find("tbody").find_all("tr")
如何仅读取外部表的行?

您可以尝试:

i.e `print "length " + str(len(rows))` prints 5. but I want to read tr of only outer table like size should be 3

结果:仅外表td内容。注意:如果外部td和内部td相等,它将返回空列表。

您可以使用如下
递归=False来实现此目的:

[x.string for x in soup.select('table > tbody > tr > td') if x not in soup.select('table > tbody > tr > table > tbody > tr > td')]

返回3。

此解决方案需要相当大的开销,因为您需要执行集合减法,这要求您首先完全加载整个集合,而它是相反的集合。虽然<代码>递归=false < /Cord>参数只考虑表的直接子集,所以开销较小。返回表3是因为表外表的2个TD和包含所有TR数据内部表的内表的1个TD。OP要求如下:“STR(LeN(行))`打印5。但我只想读取外部表的tr,如大小应为3“。我的代码就是这样。它返回3,因为外部表中有3个元素,每个元素都存储为一行,并且可以按行[0]索引到行[2]。
[x.string for x in soup.select('table > tbody > tr > td') if x not in soup.select('table > tbody > tr > table > tbody > tr > td')]
soup = BeautifulSoup(html)
table_grid_1 = soup.find("table", {"id": "GridView1"})
rows = table_grid_1.find("tbody").find_all("tr",recursive=False)
print len(rows)