Python 有没有一种简单的方法可以从a<；预处理>；标记到数据帧？_Python_Html_Dataframe_Web Scraping_Pre

Python 有没有一种简单的方法可以从a<；预处理>；标记到数据帧？

python html dataframe web-scraping

Python 有没有一种简单的方法可以从a<；预处理>；标记到数据帧？,python,html,dataframe,web-scraping,pre,Python,Html,Dataframe,Web Scraping,Pre,我试图将pre标记的内容传递给pandas数据帧，但我无法，这就是我目前所做的： import requests,pandas from bs4 import BeautifulSoup #url url='http://weather.uwyo.edu/cgi-bin/sounding?region=samer&TYPE=TEXT%3ALIST&YEAR=2019&MONTH=09&FROM=2712&TO=2712&STNM=80222'

我试图将pre标记的内容传递给pandas数据帧，但我无法，这就是我目前所做的：

import requests,pandas
from bs4 import BeautifulSoup

#url

url='http://weather.uwyo.edu/cgi-bin/sounding?region=samer&TYPE=TEXT%3ALIST&YEAR=2019&MONTH=09&FROM=2712&TO=2712&STNM=80222'
peticion=requests.get(url)
soup=BeautifulSoup(peticion.content,"html.parser")

#get only the pre content I want

all=soup.select("pre")[0]

#write the content in a text file

with open('sound','w') as f:
    f.write(all.text)

#read it 
df = pandas.read_csv('sound')
df

我得到的是一个非结构化的数据帧，由于我必须使用几个URL来实现这一点，我宁愿在第12行之后直接传递数据，而无需编写文件

它是固定宽度的文本，因此您需要通过在“\n”上拆分来生成行，然后使用固定宽度值来生成列。您可以使用csv来节省开销，但您需要一个数据帧

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

r = requests.get('http://weather.uwyo.edu/cgi-bin/sounding?region=samer&TYPE=TEXT%3ALIST&YEAR=2019&MONTH=09&FROM=2712&TO=2712&STNM=80222')
soup = bs(r.content, 'lxml')
pre = soup.select_one('pre').text
results = []

for line in pre.split('\n')[1:-1]:
    if '--' not in line:
        row = [line[i:i+7].strip() for i in range(0, len(line), 7)]
        results.append(row)

df = pd.DataFrame(results)
print(df)