Python 如何跳过一些使用靓汤的文字?
我正试图从这个网站上搜集一些数据,并创建一个数据框架。但是,我想删除描述列行中的文本“Description”。移除它的解决方案是什么 代码如下所示:Python 如何跳过一些使用靓汤的文字?,python,dataframe,web-scraping,Python,Dataframe,Web Scraping,我正试图从这个网站上搜集一些数据,并创建一个数据框架。但是,我想删除描述列行中的文本“Description”。移除它的解决方案是什么 代码如下所示: import requests from bs4 import BeautifulSoup as bs import pandas as pd records = [] tickers = ['FSLR'] url = 'https://finance.yahoo.com/quote/{}/profile?p={}' for s in t
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
records = []
tickers = ['FSLR']
url = 'https://finance.yahoo.com/quote/{}/profile?p={}'
for s in tickers:
soup = BeautifulSoup(requests.get(url.format(s,s)).content, 'html.parser')
records.append({
'symbol' : s,
'Name': soup.h1.text,
'Sector': soup.select_one('span:contains("Sector(s)") + span').text,
'Industry': soup.select_one('span:contains("Industry") + span').text,
'Description' : soup.find('section', {'class':'quote-sub-section Mt(30px)'}).text
})
df = pd.DataFrame(records)
df.head()
symbol Name Sector Industry
0 FSLR First Solar, Inc. (FSLR) Technology Solar
Description
First Solar, Inc. provides photovol...
我希望输出如下:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
records = []
tickers = ['FSLR']
url = 'https://finance.yahoo.com/quote/{}/profile?p={}'
for s in tickers:
soup = BeautifulSoup(requests.get(url.format(s,s)).content, 'html.parser')
records.append({
'symbol' : s,
'Name': soup.h1.text,
'Sector': soup.select_one('span:contains("Sector(s)") + span').text,
'Industry': soup.select_one('span:contains("Industry") + span').text,
'Description' : soup.find('section', {'class':'quote-sub-section Mt(30px)'}).text
})
df = pd.DataFrame(records)
df.head()
symbol Name Sector Industry
0 FSLR First Solar, Inc. (FSLR) Technology Solar
Description
First Solar, Inc. provides photovol...
描述字段的内容位于您当前捕获的部分下的
标记内
<section class="quote-sub-section Mt(30px)" data-reactid="216">
<h2 class="Fz(m) Lh(1) Fw(b) Mt(0) Mb(18px)" data-reactid="217">…</h2>
<p class="Mt(15px) Lh(1.6)" data-reactid="219">…</p>
</section>
…
因此,您可以使用以下方法获取
:
description=soup.find(“section”,{'class':'quote subsection Mt(30px)})。find(“p”)。text
这样做将删除字符串“Description”。谢谢您的帮助