Python 在《美丽的汤》中获得描述时,去掉奇怪的缩进

Python 在《美丽的汤》中获得描述时,去掉奇怪的缩进,python,beautifulsoup,Python,Beautifulsoup,我有一个bs4程序,在那里我收集链接的描述。它首先检查是否有任何元描述标记,如果没有,则从标记获取描述 代码如下: from bs4 import BeautifulSoup import requests def find_title(url): page = requests.get(url) soup = BeautifulSoup(page.content, 'html.parser') with open('descrip.txt', 'a', encodi

我有一个bs4程序,在那里我收集链接的描述。它首先检查是否有任何元描述标记,如果没有,则从标记获取描述

代码如下:

from bs4 import BeautifulSoup
import requests

def find_title(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')

    with open('descrip.txt', 'a', encoding='utf-8') as f:
        description = soup.find('meta', attrs={'name':'og:description'}) or soup.find('meta', attrs={'property':'description'}) or soup.find('meta', attrs={'name':'description'})

        if description:
            desc = description["content"]
        else:
            desc = soup.find_all('p')[0].getText()
            lengths = len(desc)
            index = 0

            while lengths == 1:
                index = index + 1
                desc = soup.find_all('p')[index].getText()
                lengths = len(desc)

                if lengths > 300:
                    desc = soup.find_all('p')[index].getText()[0:300]

                elif lengths < 300:
                    desc = soup.find_all('p')[index].getText()[0:lengths]

        print(desc)
        f.write(desc + '\n')

find_title('https://en.wikipedia.org/wiki/Portal:The_arts')
find_title('https://en.wikipedia.org/wiki/Portal:Biography')
find_title('https://en.wikipedia.org/wiki/Portal:Geography')
find_title('https://en.wikipedia.org/wiki/November_15')
find_title('https://en.wikipedia.org/wiki/November_16')
find_title('https://en.wikipedia.org/wiki/Wikipedia:Selected_anniversaries/November')
find_title('https://lists.wikimedia.org/mailman/listinfo/daily-article-l')
find_title('https://en.wikipedia.org/wiki/List_of_days_of_the_year')
find_title('https://en.wikipedia.org/wiki/File:Proclama%C3%A7%C3%A3o_da_Rep%C3%BAblica_by_Benedito_Calixto_1893.jpg')
find_title('https://en.wikipedia.org/wiki/First_Brazilian_Republic')
find_title('https://en.wikipedia.org/wiki/Empire_of_Brazil')
find_title('https://en.wikipedia.org/wiki/Pedro_II_of_Brazil')
find_title('https://en.wikipedia.org/wiki/Benedito_Calixto')
find_title('https://en.wikipedia.org/wiki/Rio_de_Janeiro')
find_title('https://en.wikipedia.org/wiki/Deodoro_da_Fonseca')

有没有办法解决这个问题?

strip=True
添加到
getText()
(注意:它是
get\u text()
的别名),然后添加一个空格作为分隔符。例如:

get_text(strip=True, separator=' ')
get_text(strip=True, separator=' ')