Python 使用urllib读取html时缺少某些文本_Python_Html_Beautifulsoup_Urllib

Python 使用urllib读取html时缺少某些文本

python html

Python 使用urllib读取html时缺少某些文本,python,html,beautifulsoup,urllib,Python,Html,Beautifulsoup,Urllib,我使用以下函数从名为e-liquid-recipes.com（）的网站读取html 浓缩物列表中的每一行都有一个类名“斜倚”，它列出了浓缩香料的各种信息，如名称、百分比等使用beautifulsoup提取倾角div就是一个例子（在read_html返回的html文本中也是如此） 0.30 请注意，rdrops、rgrams和rpercent缺少预期的文本（它只是一个换行符）。为什么可能是这样？纯HTML不包含这些数据，这些字段使用HTML加载后执行的JavaScript填充。谢谢Olvi

我使用以下函数从名为e-liquid-recipes.com（）的网站读取html

浓缩物列表中的每一行都有一个类名“斜倚”，它列出了浓缩香料的各种信息，如名称、百分比等

使用beautifulsoup提取倾角div就是一个例子（在read_html返回的html文本中也是如此）


0.30

请注意，rdrops、rgrams和rpercent缺少预期的文本（它只是一个换行符）。为什么可能是这样？

纯HTML不包含这些数据，这些字段使用HTML加载后执行的JavaScript填充。谢谢Olvin，我没有考虑过。我想我找到了有问题的剧本。看起来我无论如何都可以从runit计算所需的值，所以这可能不是一个失败的原因。

def read_html(url):
    # Create a custom opener with User-agent header which allows cookies.
    cookiejar = http.cookiejar.LWPCookieJar()
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookiejar))
    opener.addheaders = [
        ('User-agent', 'Mozilla/5.0'), 
        ('Content-Type', 'text/html; charset=utf-8')
    ]

    # Make the opener the (global) default opener (urlopen will use it).
    urllib.request.install_opener(opener)

    # Open URL and read response.
    response = urllib.request.urlopen(url)
    return response.read()

<div class="recline highlight flmis prmis">
 <div class="rlab" id="rfl1">
  <a href="https://e-liquid-recipes.com/flavor/8591">
   Acetyl Pyrazine 5% (
   <abbr title="The Flavor/Perfumer's Apprentice">
    TPA
   </abbr>
   )
  </a>
 </div>
 <div class="runit" id="flu1">
  0.30
 </div>
 <div class="rdrops" id="fld1">
 </div>
 <div class="rgrams" id="flg1">
 </div>
 <div class="rpercent" id="flp1">
 </div>
 <br/>
</div>