Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/348.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何跳过一些使用靓汤的文字?_Python_Dataframe_Web Scraping - Fatal编程技术网

Python 如何跳过一些使用靓汤的文字?

Python 如何跳过一些使用靓汤的文字?,python,dataframe,web-scraping,Python,Dataframe,Web Scraping,我正试图从这个网站上搜集一些数据,并创建一个数据框架。但是,我想删除描述列行中的文本“Description”。移除它的解决方案是什么 代码如下所示: import requests from bs4 import BeautifulSoup as bs import pandas as pd records = [] tickers = ['FSLR'] url = 'https://finance.yahoo.com/quote/{}/profile?p={}' for s in t

我正试图从这个网站上搜集一些数据,并创建一个数据框架。但是,我想删除描述列行中的文本“Description”。移除它的解决方案是什么

代码如下所示:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd


records = []
tickers = ['FSLR']

url = 'https://finance.yahoo.com/quote/{}/profile?p={}'

for s in tickers:
    
    soup = BeautifulSoup(requests.get(url.format(s,s)).content, 'html.parser')

    records.append({
        'symbol' : s,
        'Name': soup.h1.text,
        'Sector': soup.select_one('span:contains("Sector(s)") + span').text,
        'Industry': soup.select_one('span:contains("Industry") + span').text,
        'Description' : soup.find('section', {'class':'quote-sub-section Mt(30px)'}).text

    })

df = pd.DataFrame(records)
df.head()
        symbol  Name                      Sector        Industry
0       FSLR    First Solar, Inc. (FSLR)  Technology    Solar

Description
First Solar, Inc. provides photovol...
我希望输出如下:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd


records = []
tickers = ['FSLR']

url = 'https://finance.yahoo.com/quote/{}/profile?p={}'

for s in tickers:
    
    soup = BeautifulSoup(requests.get(url.format(s,s)).content, 'html.parser')

    records.append({
        'symbol' : s,
        'Name': soup.h1.text,
        'Sector': soup.select_one('span:contains("Sector(s)") + span').text,
        'Industry': soup.select_one('span:contains("Industry") + span').text,
        'Description' : soup.find('section', {'class':'quote-sub-section Mt(30px)'}).text

    })

df = pd.DataFrame(records)
df.head()
        symbol  Name                      Sector        Industry
0       FSLR    First Solar, Inc. (FSLR)  Technology    Solar

Description
First Solar, Inc. provides photovol...

描述字段的内容位于您当前捕获的部分下的
标记内

<section class="quote-sub-section Mt(30px)" data-reactid="216">
    <h2 class="Fz(m) Lh(1) Fw(b) Mt(0) Mb(18px)" data-reactid="217">…</h2>
    <p class="Mt(15px) Lh(1.6)" data-reactid="219">…</p>
</section>

因此,您可以使用以下方法获取

description=soup.find(“section”,{'class':'quote subsection Mt(30px)})。find(“p”)。text


这样做将删除字符串“Description”。

谢谢您的帮助