Web scraping 是否有一个选择器（在Python中）可以用来选择没有标记的元素？_Web Scraping_Css Selectors_Python 3.6

Web scraping 是否有一个选择器（在Python中）可以用来选择没有标记的元素？

web-scraping

Web scraping 是否有一个选择器（在Python中）可以用来选择没有标记的元素？,web-scraping,css-selectors,python-3.6,Web Scraping,Css Selectors,Python 3.6,我正试图使用Python通过上面所示的类似代码对多个页面进行web抓取。我尝试使用基本的Python CSS选择器来获取文本，但我无法解决。我主要想知道是否有一个选择器可以通过BeautifulSoupSelect方法传递，该方法可以选择位于中但不在中的元素。我试图在不知道它的功能的情况下进行选择，但这不起作用我对HMTL知之甚少，因此对于上述代码示例中出现的任何错误或混乱，我深表歉意。删除子表标记可能更容易 <div id="some id" class="

我正试图使用Python通过上面所示的类似代码对多个页面进行web抓取。我尝试使用基本的Python CSS选择器来获取文本，但我无法解决。我主要想知道是否有一个选择器可以通过BeautifulSoupSelect方法传递，该方法可以选择位于中但不在中的元素。我试图在不知道它的功能的情况下进行选择，但这不起作用

我对HMTL知之甚少，因此对于上述代码示例中出现的任何错误或混乱，我深表歉意。

删除子表标记可能更容易

<div id="some id" class="some class">
    <table id="some other id" class="a different class">...</table>
    
        
        I want this text,


    <br>
    
        this text,


    <br>


        along with this text


    </div>

解决办法其实很简单。经过实验，我发现您可以使用以下代码来获取上述HTML的文本

from bs4 import BeautifulSoup as bs

html = '''
<div id="some id" class="some class">
    <table id="some other id" class="a different class">not this</table>


        I want this text,


    <br>

        this text,


    <br>


        along with this text


    </div>
'''

soup = bs(html, 'lxml')
soup.select_one('[id="some other id"]').extract()
print(soup.select_one('[id="some id"]').text)

Python的list函数将所选HTML分成单独的“块”——标记下的所有内容、第一位文本、标记、下一位文本、另一位标记和最后一位文本。因为我们只需要包含文本的“块”，所以我们将文本列表中的“-1”、“3”和“-5”元素添加到trueText列表中

执行此代码将创建一个包含上述HTML中所需文本的列表trueText

from bs4 import BeautifulSoup as bs

html = '''
<div id="some id" class="some class">
    <table id="some other id" class="a different class">not this</table>


        I want this text,


    <br>

        this text,


    <br>


        along with this text


    </div>
'''

soup = bs(html, 'lxml')
soup.select_one('[id="some other id"]').extract()
print(soup.select_one('[id="some id"]').text)