Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/oop/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Html 解析<;br>;带有beautifulsoup的标记_Html_Web Scraping_Beautifulsoup_Tags_Web Crawler - Fatal编程技术网

Html 解析<;br>;带有beautifulsoup的标记

Html 解析<;br>;带有beautifulsoup的标记,html,web-scraping,beautifulsoup,tags,web-crawler,Html,Web Scraping,Beautifulsoup,Tags,Web Crawler,我正在抓取一个网站,标签的结构是: <div class="content" <p> "C Space" <br> "802 white avenue" <br> "xyz 123" <br> "Lima" </p> 我得到以下输出: C Space802白色大道123利马 而我希望输出是:C空间80

我正在抓取一个网站,标签的结构是:

<div class="content"
    <p> 
        "C Space"
        <br>
        "802 white avenue"
        <br>
        "xyz 123"
        <br>
        "Lima"
    </p>
我得到以下输出: C Space802白色大道123利马

而我希望输出是:C空间802白色大道xyz 123利马

从后续br标记获取数据时,如何添加额外的空白


谢谢

您可以在此处使用
split
join

>>> ' '.join(templist.get_text().split())
'"C Space" "802 white avenue" "xyz 123" "Lima"'

您可以使用以下参数:

In [4]: elm = soup.select_one(".content")

In [5]: print(elm.get_text(strip=True, separator=" "))
"C Space" "802 white avenue" "xyz 123" "Lima"
In [4]: elm = soup.select_one(".content")

In [5]: print(elm.get_text(strip=True, separator=" "))
"C Space" "802 white avenue" "xyz 123" "Lima"