Python 将特定文本放入pandas数据框,使用BeautifulSoup进行WebScrap

Python 将特定文本放入pandas数据框,使用BeautifulSoup进行WebScrap,python,pandas,web-scraping,beautifulsoup,Python,Pandas,Web Scraping,Beautifulsoup,我真的很难将一些数据点从网页提取到数据框中 我感兴趣的是提取值38.31和-0.06 如果我使用name\u y=soup.find(id='current')我会得到以下结果: <div id="current"> <b>Current<span class="currentTitle"> S&amp;P 500 PE Ratio</span>:</b> 38.31 <span

我真的很难将一些数据点从网页提取到数据框中

我感兴趣的是提取值
38.31
-0.06

如果我使用
name\u y=soup.find(id='current')
我会得到以下结果:

<div id="current">
<b>Current<span class="currentTitle">
S&amp;P 500 PE Ratio</span>:</b>
38.31


<span class="neg">

-0.06
(-0.16%)

</span>
<div id="timestamp">


10:39 AM EST, Fri Mar 5

</div>
</div>
预期最终结果

print(df)
      PE         Change
0     38.31      -0.06
完整代码:

0    \n38.37\n\n\n\n\n\n
dtype: object
import requests
from bs4 import BeautifulSoup
import re

import pandas as pd

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

soup = BeautifulSoup(html, 'html.parser' )

# name_y = soup.find(id='current')

df = pd.Series(soup.find(id='current').b.next_sibling)

print(df)
您可以使用
.strip()
删除
\n
字符

soup.find(id='current').b.next_sibling.strip()
完整代码:

import requests
from bs4 import BeautifulSoup
import re

import pandas as pd

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

soup = BeautifulSoup(html, 'html.parser')

change = soup.find(id='current').select_one('.pos, .neg').get_text().split("(")[0].strip()
pe = soup.find(id='current').b.next_sibling.strip()

df = pd.DataFrame([({'PE': pe, 'Change': change})])
print(df)
您可以使用
.strip()
删除
\n
字符

soup.find(id='current').b.next_sibling.strip()
完整代码:

import requests
from bs4 import BeautifulSoup
import re

import pandas as pd

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

soup = BeautifulSoup(html, 'html.parser')

change = soup.find(id='current').select_one('.pos, .neg').get_text().split("(")[0].strip()
pe = soup.find(id='current').b.next_sibling.strip()

df = pd.DataFrame([({'PE': pe, 'Change': change})])
print(df)

您可以检查此代码是否有效:

from bs4 import BeautifulSoup as soup
import requests

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

c=soup(html, "html.parser")
name_y = c.find(id='current')
x = name_y.text
y = name_y.find("span", {"class":"neg"})
print(x.split()[5])
print(y.text.strip().split()[0])
输出

38.09
-0.27

您可以检查此代码是否有效:

from bs4 import BeautifulSoup as soup
import requests

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

c=soup(html, "html.parser")
name_y = c.find(id='current')
x = name_y.text
y = name_y.find("span", {"class":"neg"})
print(x.split()[5])
print(y.text.strip().split()[0])
输出

38.09
-0.27

非常感谢您提供了一个优秀而清晰的答案,非常感谢:)@QHarr发现得很好,谢谢。好的,这真的很令人印象深刻,特别感谢@QHarr!你以前帮过我很多次,真的很感激。我肮脏的解决方案是做一个try/except语句,但这是一个更干净更好的方式:)感谢您提供了一个优秀而清晰的答案,非常感谢:)@QHarr发现得很好,谢谢。好的,这真的很令人印象深刻,特别感谢@QHarr!你以前帮过我很多次,真的很感激。我的肮脏解决方案是做一个try/except语句,但这是更干净更好的方式:)