Python 将特定文本放入pandas数据框，使用BeautifulSoup进行WebScrap_Python_Pandas_Web Scraping_Beautifulsoup

Python 将特定文本放入pandas数据框，使用BeautifulSoup进行WebScrap

python pandas web-scraping

Python 将特定文本放入pandas数据框，使用BeautifulSoup进行WebScrap,python,pandas,web-scraping,beautifulsoup,Python,Pandas,Web Scraping,Beautifulsoup,我真的很难将一些数据点从网页提取到数据框中我感兴趣的是提取值38.31和-0.06 如果我使用name\u y=soup.find（id='current'）我会得到以下结果： <div id="current"> <b>Current<span class="currentTitle"> S&P 500 PE Ratio</span>:</b> 38.31 <span

我真的很难将一些数据点从网页提取到数据框中

我感兴趣的是提取值

38.31

和

-0.06

如果我使用

name\u y=soup.find（id='current'）

我会得到以下结果：

<div id="current">
<b>Current<span class="currentTitle">
S&amp;P 500 PE Ratio</span>:</b>
38.31


<span class="neg">

-0.06
(-0.16%)

</span>
<div id="timestamp">


10:39 AM EST, Fri Mar 5

</div>
</div>

预期最终结果：

print(df)
      PE         Change
0     38.31      -0.06

完整代码：

0    \n38.37\n\n\n\n\n\n
dtype: object

import requests
from bs4 import BeautifulSoup
import re

import pandas as pd

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

soup = BeautifulSoup(html, 'html.parser' )

# name_y = soup.find(id='current')

df = pd.Series(soup.find(id='current').b.next_sibling)

print(df)

您可以使用

.strip（）

删除

\n

字符

soup.find(id='current').b.next_sibling.strip()

完整代码：

import requests
from bs4 import BeautifulSoup
import re

import pandas as pd

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

soup = BeautifulSoup(html, 'html.parser')

change = soup.find(id='current').select_one('.pos, .neg').get_text().split("(")[0].strip()
pe = soup.find(id='current').b.next_sibling.strip()

df = pd.DataFrame([({'PE': pe, 'Change': change})])
print(df)

您可以使用

.strip（）

删除

\n

字符

soup.find(id='current').b.next_sibling.strip()

完整代码：

import requests
from bs4 import BeautifulSoup
import re

import pandas as pd

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

soup = BeautifulSoup(html, 'html.parser')

change = soup.find(id='current').select_one('.pos, .neg').get_text().split("(")[0].strip()
pe = soup.find(id='current').b.next_sibling.strip()

df = pd.DataFrame([({'PE': pe, 'Change': change})])
print(df)

您可以检查此代码是否有效：

from bs4 import BeautifulSoup as soup
import requests

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

c=soup(html, "html.parser")
name_y = c.find(id='current')
x = name_y.text
y = name_y.find("span", {"class":"neg"})
print(x.split()[5])
print(y.text.strip().split()[0])

输出

38.09
-0.27

您可以检查此代码是否有效：

from bs4 import BeautifulSoup as soup
import requests

url = 'https://www.multpl.com/s-p-500-pe-ratio'

res = requests.get(url)
html = res.text

c=soup(html, "html.parser")
name_y = c.find(id='current')
x = name_y.text
y = name_y.find("span", {"class":"neg"})
print(x.split()[5])
print(y.text.strip().split()[0])

输出

38.09
-0.27

非常感谢您提供了一个优秀而清晰的答案，非常感谢：）@QHarr发现得很好，谢谢。好的，这真的很令人印象深刻，特别感谢@QHarr！你以前帮过我很多次，真的很感激。我肮脏的解决方案是做一个try/except语句，但这是一个更干净更好的方式：）感谢您提供了一个优秀而清晰的答案，非常感谢：）@QHarr发现得很好，谢谢。好的，这真的很令人印象深刻，特别感谢@QHarr！你以前帮过我很多次，真的很感激。我的肮脏解决方案是做一个try/except语句，但这是更干净更好的方式：）