Python 为什么我的靓汤没有得到查询字符串的链接？_Python_Beautifulsoup

Python 为什么我的靓汤没有得到查询字符串的链接？

python

Python 为什么我的靓汤没有得到查询字符串的链接？,python,beautifulsoup,Python,Beautifulsoup,我试图通过BeautifulSoup获取锚定标记的href，但它无法检索查询字符串！这是html代码段： <td class="titleColumn"> 143. <a href="/title/tt1302006/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962- 327b42fe94b1&pf_rd_r=0GYQY7SGFV9AK9CV3019&pf_rd_s=cente

我试图通过BeautifulSoup获取锚定标记的href，但它无法检索查询字符串！这是html代码段：

<td class="titleColumn">
  143.
  <a href="/title/tt1302006/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=e31d89dd-322d-4646-8962- 
  327b42fe94b1&pf_rd_r=0GYQY7SGFV9AK9CV3019&pf_rd_s=center- 
  1&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_tt_143"
  title="Martin Scorsese (dir.), Robert De Niro, Al Pacino" >The Irishman</a>
    <span class="secondaryInfo">(2019)</span>
</td>

soup中的链接也没有查询字符串

如果不定义用户代理，有时web服务器的响应会有所不同。单击以下链接阅读有关用户代理的更多信息

下面是如何在代码中定义用户代理头

import requests
from bs4 import BeautifulSoup
add="https://www.imdb.com/chart/top/?sort=us,desc&mode=simple&page=1"
r = requests.get(add, headers = {'User-Agent': "user-agent"})
soup = BeautifulSoup(r.text)
i=1;
for movie in soup.find_all("td",{"class":"titleColumn"}):
    print (add+movie.find('a')['href'])

试试这个

import requests
from bs4 import BeautifulSoup
user-agent = 'valid user agent'  #enter a valid user agent here
add="https://www.imdb.com/chart/top/?sort=us,desc&mode=simple&page=1"
r = requests.get(add, headers = {'User-Agent': user-agent})
soup = BeautifulSoup(r.content)
i=1;
for movie in soup.find_all("td",{"class":"titleColumn"}):
    print (add+movie.find('a')['href'])

注意，我使用了r.content而不是r.text。我还发现，有时在这些情况下使用有效的用户代理非常有用

您确定查询字符串是HTML格式的，而不是后来通过JavaScript添加的吗？这不是问题所在，但感谢您的关注首先，如果结果是HTML或XML（除其他外），请使用

r.content

而不是

r.text

. 其次，它不检索查询字符串的确切含义是什么？它只返回一个字符串，这也恰好是错误的？汤中的链接没有查询字符串意味着什么？这与请求有关，我尝试过使用

cURL

，更改用户代理，并使用urllib，原始HTML不会像他所说的那样显示查询变量。尝试用另一个字符串替换“user agent”字符串，即“Mozilla/5.0”（Windows NT 6.1；WOW64；rv:12.0）Gecko/20100101 Firefox/12.0“这是正确的解释，但您必须使用真实浏览器发送的用户代理。从浏览器控制台复制

navigator.userAgent

的值。

import requests
from bs4 import BeautifulSoup
user-agent = 'valid user agent'  #enter a valid user agent here
add="https://www.imdb.com/chart/top/?sort=us,desc&mode=simple&page=1"
r = requests.get(add, headers = {'User-Agent': user-agent})
soup = BeautifulSoup(r.content)
i=1;
for movie in soup.find_all("td",{"class":"titleColumn"}):
    print (add+movie.find('a')['href'])