Python 使用BeautifulSoup刮取后正确设置表格格式_Python_Beautifulsoup

Python 使用BeautifulSoup刮取后正确设置表格格式

python

Python 使用BeautifulSoup刮取后正确设置表格格式,python,beautifulsoup,Python,Beautifulsoup,我是Python新手我一直想从桌子上刮一张桌子。目标表格标题为“身体系统利用率” 我能够用BeautifulSoup捕捉到这张桌子；然而，刮取的数据帧让我抓狂，我找不到解决这个问题的方法我的代码： import re import bs4 as bs4 import urllib.request source=urllib.request.urlopen('http://www.phc4.org/reports/utilization/inpatient/CountyReport20192C

我是Python新手

我一直想从桌子上刮一张桌子。目标表格标题为“身体系统利用率”

我能够用BeautifulSoup捕捉到这张桌子；然而，刮取的数据帧让我抓狂，我找不到解决这个问题的方法

我的代码：

import re
import bs4 as bs4
import urllib.request
source=urllib.request.urlopen('http://www.phc4.org/reports/utilization/inpatient/CountyReport20192C001.htm').read()
soup=bs4.BeautifulSoup(source,'lxml')
#find the county utilization table by MDC 
#using the parental tag scrapling method, find the exact table index then save the parental table
table_mdc=soup.find(text=re.compile("Utilization by Body System")).findParent('table')
# print (table_mdc)
# #constuct the table
for row in table_mdc.find_all('tr'):
    for cell in row.find_all('td'):
        print(cell.text)
with open ('utilization.txt','w') as r:
    for row in table_mdc.find_all('tr'):
        for cell in row.find_all('td'):
            r.write(cell.text)

例如，数据框中的刮片打印为：

Utilization by Body System 
MDC Description
Total Cases
Number
Percent
Total Charges
% of Charges
Avg. Charge
Total Days
% of Total Days
Avg. LOS

Total

 
2,594
 
 
100.0%
 
 
$101,757,824
 
 
100.0%
 
 
$39,228
 
 
11,972
 
 
100.0%
 
 
4.6

它的输出和txt文件中都有很多新行。理想的txt文件应如下所示：

（标题中没有“总案例”）

如何克服这些问题？

将熊猫作为pd导入
df=pd.read\u html(
"http://www.phc4.org/reports/utilization/inpatient/CountyReport20192C001.htm，attrs={“id”：“dgBodySystem”}，头=0）[0]
打印（df）
df.to_csv（“data.csv”，index=False）

输出：

将熊猫作为pd导入
df=pd.read\u html(
"http://www.phc4.org/reports/utilization/inpatient/CountyReport20192C001.htm，attrs={“id”：“dgBodySystem”}，头=0）[0]
打印（df）
df.to_csv（“data.csv”，index=False）

输出：

所有网络爬虫的参考信息：scrape的过去时是scraped，而不是scraped。所有网络爬虫的参考信息：scrape的过去时是scraped，而不是scraped。