Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 如何使用BeautifulSoup4解析表格并优雅地打印?_Python 3.x_Beautifulsoup_Python Requests - Fatal编程技术网

Python 3.x 如何使用BeautifulSoup4解析表格并优雅地打印?

Python 3.x 如何使用BeautifulSoup4解析表格并优雅地打印?,python-3.x,beautifulsoup,python-requests,Python 3.x,Beautifulsoup,Python Requests,将文本数据收集到单个行和单元格的平面数组中。将其转置,使每列的所有内容都聚集到一行中。创建一个数组,其中包含每(最初)列最长项的长度。打印行时,使用此数据分隔每个单元格。代码: import requests from bs4 import BeautifulSoup req = requests.get('Page.html') soup = BeautifulSoup(req.content, 'html.parser') tables = soup.find_all('table') t

将文本数据收集到单个行和单元格的平面数组中。将其转置,使每列的所有内容都聚集到一行中。创建一个数组,其中包含每(最初)列最长项的长度。打印行时,使用此数据分隔每个单元格。代码:

import requests
from bs4 import BeautifulSoup

req = requests.get('Page.html')
soup = BeautifulSoup(req.content, 'html.parser')
tables = soup.find_all('table')
table = tables[0]
print(table.text)
这看起来就像你在控制台上做的一样优雅。(要添加垂直线,只需使用
|
而不是空格连接行。)

我内联了表数据,因为我无法访问您的
页面.html
,但是访问表数据似乎不是问题所在


哦,让我们在四周添加线条。就因为我能:

from bs4 import BeautifulSoup

content = '''
<table class="gridtable">
<tbody>
<tr>
<th>Store #</th><th>City Name</th><th>Orders</th></tr>
<tr><td>1</td><td style="text-align:left">Phoenix</td><td>70</td></tr>
<tr><td>2</td><td style="text-align:left">Columbus</td><td>74</td></tr>
<tr><td>3</td><td style="text-align:left">New York</td><td>112</td></tr>
<tr><td></td><td>TOTAL</td><td>256</td></tr></tbody>
</table>
'''

def print_table_nice(table):
    cells = [[cell.text for cell in row.find_all(['td','th'])] for row in table.find_all('tr')]
    transposed = list(map(list, zip(*cells)))
    widths = [str(max([len(str(item)) for item in items])) for items in transposed]
    for row in cells:
        print (' '.join(("{:"+width+"s}").format(item) for width,item in zip(widths,row)))

soup = BeautifulSoup(content, 'html.parser')
tables = soup.find_all('table')
table = tables[0]
print_table_nice(table)
这是一个有趣的复杂问题,因为这需要将
th
td
行分开。但是,对于多行行,它将无法正常工作。结果是:

+--------+-----------+-------+
|商店#|城市名称|订单|
+--------+-----------+-------+
|1 |凤凰城| 70|
|2 |哥伦布| 74|
|3 |纽约| 112|
||总计| 256|
+--------+-----------+-------+
@JohnS您可以使用。只需修改
print\u table\u nice
函数,返回一个嵌套列表并插入每个项目。@JohnS是的,在这里:您将找到一个带有嵌套列表的SQL查询示例,您可以找到documtation。
from bs4 import BeautifulSoup

content = '''
<table class="gridtable">
<tbody>
<tr>
<th>Store #</th><th>City Name</th><th>Orders</th></tr>
<tr><td>1</td><td style="text-align:left">Phoenix</td><td>70</td></tr>
<tr><td>2</td><td style="text-align:left">Columbus</td><td>74</td></tr>
<tr><td>3</td><td style="text-align:left">New York</td><td>112</td></tr>
<tr><td></td><td>TOTAL</td><td>256</td></tr></tbody>
</table>
'''

def print_table_nice(table):
    cells = [[cell.text for cell in row.find_all(['td','th'])] for row in table.find_all('tr')]
    transposed = list(map(list, zip(*cells)))
    widths = [str(max([len(str(item)) for item in items])) for items in transposed]
    for row in cells:
        print (' '.join(("{:"+width+"s}").format(item) for width,item in zip(widths,row)))

soup = BeautifulSoup(content, 'html.parser')
tables = soup.find_all('table')
table = tables[0]
print_table_nice(table)
def print_table_nice(table):
    header = [cell.text for cell in table.select('tr th')]
    cells = [[cell.text for cell in row.select('td')] for row in table.select('tr') if row.select('td')]
    table = [header]+cells
    transposed = list(map(list, zip(*table)))
    widths = [str(max([len(str(item)) for item in items])) for items in transposed]
    print ('+'+('-+-'.join('-'*int(width) for width in widths))+'+')
    print ('|'+(' | '.join(("{:"+width+"s}").format(item) for width,item in zip(widths,header)))+'|')
    print ('+'+('-+-'.join('-'*int(width) for width in widths))+'+')
    for row in cells:
        print ('|'+(' | '.join(("{:"+width+"s}").format(item) for width,item in zip(widths,row)))+'|')
    print ('+'+('-+-'.join('-'*int(width) for width in widths))+'+')