Python 将特定文本放入pandas数据框,使用BeautifulSoup进行WebScrap
我真的很难将一些数据点从网页提取到数据框中 我感兴趣的是提取值Python 将特定文本放入pandas数据框,使用BeautifulSoup进行WebScrap,python,pandas,web-scraping,beautifulsoup,Python,Pandas,Web Scraping,Beautifulsoup,我真的很难将一些数据点从网页提取到数据框中 我感兴趣的是提取值38.31和-0.06 如果我使用name\u y=soup.find(id='current')我会得到以下结果: <div id="current"> <b>Current<span class="currentTitle"> S&P 500 PE Ratio</span>:</b> 38.31 <span
38.31
和-0.06
如果我使用name\u y=soup.find(id='current')
我会得到以下结果:
<div id="current">
<b>Current<span class="currentTitle">
S&P 500 PE Ratio</span>:</b>
38.31
<span class="neg">
-0.06
(-0.16%)
</span>
<div id="timestamp">
10:39 AM EST, Fri Mar 5
</div>
</div>
预期最终结果:
print(df)
PE Change
0 38.31 -0.06
完整代码:
0 \n38.37\n\n\n\n\n\n
dtype: object
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
url = 'https://www.multpl.com/s-p-500-pe-ratio'
res = requests.get(url)
html = res.text
soup = BeautifulSoup(html, 'html.parser' )
# name_y = soup.find(id='current')
df = pd.Series(soup.find(id='current').b.next_sibling)
print(df)
您可以使用.strip()
删除\n
字符
soup.find(id='current').b.next_sibling.strip()
完整代码:
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
url = 'https://www.multpl.com/s-p-500-pe-ratio'
res = requests.get(url)
html = res.text
soup = BeautifulSoup(html, 'html.parser')
change = soup.find(id='current').select_one('.pos, .neg').get_text().split("(")[0].strip()
pe = soup.find(id='current').b.next_sibling.strip()
df = pd.DataFrame([({'PE': pe, 'Change': change})])
print(df)
您可以使用.strip()
删除\n
字符
soup.find(id='current').b.next_sibling.strip()
完整代码:
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
url = 'https://www.multpl.com/s-p-500-pe-ratio'
res = requests.get(url)
html = res.text
soup = BeautifulSoup(html, 'html.parser')
change = soup.find(id='current').select_one('.pos, .neg').get_text().split("(")[0].strip()
pe = soup.find(id='current').b.next_sibling.strip()
df = pd.DataFrame([({'PE': pe, 'Change': change})])
print(df)
您可以检查此代码是否有效:
from bs4 import BeautifulSoup as soup
import requests
url = 'https://www.multpl.com/s-p-500-pe-ratio'
res = requests.get(url)
html = res.text
c=soup(html, "html.parser")
name_y = c.find(id='current')
x = name_y.text
y = name_y.find("span", {"class":"neg"})
print(x.split()[5])
print(y.text.strip().split()[0])
输出
38.09
-0.27
您可以检查此代码是否有效:
from bs4 import BeautifulSoup as soup
import requests
url = 'https://www.multpl.com/s-p-500-pe-ratio'
res = requests.get(url)
html = res.text
c=soup(html, "html.parser")
name_y = c.find(id='current')
x = name_y.text
y = name_y.find("span", {"class":"neg"})
print(x.split()[5])
print(y.text.strip().split()[0])
输出
38.09
-0.27
非常感谢您提供了一个优秀而清晰的答案,非常感谢:)@QHarr发现得很好,谢谢。好的,这真的很令人印象深刻,特别感谢@QHarr!你以前帮过我很多次,真的很感激。我肮脏的解决方案是做一个try/except语句,但这是一个更干净更好的方式:)感谢您提供了一个优秀而清晰的答案,非常感谢:)@QHarr发现得很好,谢谢。好的,这真的很令人印象深刻,特别感谢@QHarr!你以前帮过我很多次,真的很感激。我的肮脏解决方案是做一个try/except语句,但这是更干净更好的方式:)