Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用Beauty Soup从非类部分获取数据_Python_Parsing_Python 2.7_Html Parsing_Beautifulsoup - Fatal编程技术网

Python 使用Beauty Soup从非类部分获取数据

Python 使用Beauty Soup从非类部分获取数据,python,parsing,python-2.7,html-parsing,beautifulsoup,Python,Parsing,Python 2.7,Html Parsing,Beautifulsoup,我还是个新手,正在学习python和靓汤。我对如何从一个非类的HTML片段中获取文本很感兴趣 这是我正在使用的HTML片段: <section class="userbody"> <script type="text/javascript"></script> <figure class="iw"> <div id="ci"> <img id="iwi" title="ima

我还是个新手,正在学习python和靓汤。我对如何从一个非类的HTML片段中获取文本很感兴趣

这是我正在使用的HTML片段:

<section class="userbody">
    <script type="text/javascript"></script>
    <figure class="iw">
        <div id="ci">
            <img id="iwi" title="image 2" alt="" src="http://images.craigslist.org/00C0C_daJm4U9yU5B_600x450.jpg" style="min-width: inherit; min-height: 450px;"></img>
        </div>
        <div id="thumbs"></div>
    </figure>
    <div class="mapAndAttrs">
        <div class="mapbox">
            <div id="map" class="leaflet-container leaflet-fade-anim" data-longitude="-84.072447" data-latitude="33.908534" tabindex="0">
                <div class="leaflet-map-pane" style="transform: translate(0px, 0px);"></div>
                <div class="leaflet-control-container">
                    <div class="leaflet-top leaflet-left"></div>
                    <div class="leaflet-top leaflet-right"></div>
                    <div class="leaflet-bottom leaflet-left"></div>
                    <div class="leaflet-bottom leaflet-right">
                        <div class="leaflet-control-attribution leaflet-control"></div>
                    </div>
                </div>
            </div>
            <div class="mapaddress">

                Some Address

            </div>
        </div>
        <div class="attributes"></div>
    </div>
    <section id="postingbody">
            some posting info
            <br></br>
             more posting info
             <br></br>
    </section>
    <section class="cltags"></section>
    <div class="postinginfos"></div>
</section>
findAll()似乎不适用于我在中尝试过的没有类的标记

     for post in soup.findall("section", { "id" : "postingbody" }):
       postText = ''.join(post.findAll(text=True))

如何获取id=“postingbody”部分中的文本

考虑到
s
是html字符串,您可以执行以下操作:

from bs4 import BeautifulSoup

soup = BeautifulSoup(s)
print soup.find(attrs={'id' : 'postingbody'})
输出:

<section id="postingbody">
            some posting info
            <br/>
             more posting info
             <br/>
</section>

一些发帖信息

更多发布信息

除了游戏之外,Brainiac的答案是:要获得文本,只需将.text放在后面

因此:


如果您使用的是BeautifulSoup4,请按如下方式操作:

element = soup.find(id="postingbody")

谢谢@SergioP。你们知道我要去哪里:)谢谢你们。这个社区太棒了!!
print soup.find(attrs={'id' : 'postingbody'}).text
element = soup.find(id="postingbody")