Python 使用BeautifulSoup和lxml提取HTML代码中的嵌套div_Python_Beautifulsoup_Lxml

Python 使用BeautifulSoup和lxml提取HTML代码中的嵌套div

python

Python 使用BeautifulSoup和lxml提取HTML代码中的嵌套div,python,beautifulsoup,lxml,Python,Beautifulsoup,Lxml,我有以下HTML代码：我试图提取并打印图像中突出显示的线条 some text.some text是第一个div的文本，class=chat message嵌套在id=chat messages的div中。换句话说，我正在尝试提取div id=chat messages的第一个子div的文本，而他的所有子div的结构都相似。我尝试了： import requests from bs4 import BeautifulSoup url = "the url this is used for"

我有以下HTML代码：我试图提取并打印图像中突出显示的线条 some text.some text是第一个div的文本，class=chat message嵌套在id=chat messages的div中。换句话说，我正在尝试提取div id=chat messages的第一个子div的文本，而他的所有子div的结构都相似。我尝试了：

import requests
from bs4 import BeautifulSoup

url = "the url this is used for"
r = requests.get(url)

soup = BeautifulSoup(r.content, 'lxml')
g_data = soup.find('div',{'class':'chat-message-content selectable'})
print(g_data.text)

这给了我一个错误：

AttributeError: 'NoneType' object has no attribute 'text'

好像g_数据是空的。我做错了什么？谢谢

HTML代码：

<html>
<head>
    <title>
    </title>
</head>

<body>
    <div id="main">
        <div data-reactroot="" id="app">
            <div class="top-bar-authenticated" id="top-bar">
            </div>


            <div class="closed" id="navigation-bar">
            </div>


            <div id="right-sidebar">
                <div id="chat">
                    <div id="chat-head">
                    </div>


                    <div id="chat-title">
                    </div>


                    <div id="chat-messages">
                        <div class="chat-message">
                            <div class="chat-message-avatar" style="background-image: url(&quot;https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/65/657dcec97cc00bc378629930ecae1776c0d981e0.jpg&quot;);">
                            </div>
                            <a class="chat-message-username clickable">
                            <div class="iron-color">
                                aloe
                            </div></a>

                            <div class="chat-message-content selectable">
                                <!-- react-text: 2532 -->some text<!-- /react-text -->
                            </div>
                        </div>


                        <div class="chat-message">
                            <div class="chat-message-avatar" style="background-image: url(&quot;https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/65/657dcec97cc00bc378629930ecae1776c0d981e0.jpg&quot;);">
                            </div>
                            <a class="chat-message-username clickable">
                            <div class="iron-color">
                                aloe
                            </div></a>

                            <div class="chat-message-content selectable">
                                <!-- react-text: 2533 -->some other text<!-- /react-text -->
                            </div>
                        </div>


                        <div class="chat-message">
                        </div>


                        <div class="chat-message">
                        </div>


                        <div class="chat-message">
                        </div>


                        <div class="chat-message">
                        </div>

如果要搜索与两个或多个CSS类匹配的标记，应使用CSS选择器：

g_data = soup.select('div.chat-message-content.selectable')

阅读您对这个问题的评论，我发现您正在尝试解析一个使用JavaScript加载内容的网站，这就是为什么请求对您不起作用。您应该与webdrivere.g一起使用，。类似下面的代码：

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.csgoarena.com/home")

soup = BeautifulSoup(driver.page_source, 'lxml')
g_data = soup.findAll('div',{'class':'chat-message-content selectable'})
print(g_data)

由于您需要所有选定元素的.text：

>>> for match in g_data:
    print(match.text)


not everytime :D
I understand :)
 NuuZy csgoarena.com but he won GA's only when it were long 
Yea I always saw him
Everyday
caught
(...)

@akashkarothiya那我怎么能提取一些文本呢？刚刚在这个URL上尝试过：它成功了！当你提出请求的时候，问题可能发生过？如果你打印r.内容会怎么样？你是对的，这很奇怪。。它打印的是：的一部分，而不是主要部分的内容。你知道为什么吗？不知道，我看到一个假设：你的HTML似乎没有完成，它缺少很多和更多的内容。也许可以试着完成它？尽管这对给定的html示例有效，但lxml解析器支持多个类，所以这不是问题所在。您帮了我很大的忙。谢谢你抽出时间！