Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/349.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:使用BeautifulSoup获取href的URL时遇到问题_Python_Html_Beautifulsoup_Href - Fatal编程技术网

Python:使用BeautifulSoup获取href的URL时遇到问题

Python:使用BeautifulSoup获取href的URL时遇到问题,python,html,beautifulsoup,href,Python,Html,Beautifulsoup,Href,我正在学习如何首先使用BeautifulSoup在Python中进行web抓取。我遇到了一个我不确定如何解决的问题,我将向您展示我的代码片段: from bs4 import BeautifulSoup import requests start_url = "https://www1.interactivebrokers.com/en/index.php?f=2222&exch=nasdaq&showcategories=STK#productbuffer"

我正在学习如何首先使用BeautifulSoup在Python中进行web抓取。我遇到了一个我不确定如何解决的问题,我将向您展示我的代码片段:

from bs4 import BeautifulSoup
import requests

start_url = "https://www1.interactivebrokers.com/en/index.php?f=2222&exch=nasdaq&showcategories=STK#productbuffer"

# Download the HTML from start_url:
downloaded_html = requests.get(start_url)

# Parse the HTML with BeautifulSoup and create a soup object
soup = BeautifulSoup(downloaded_html.text)
# Select table where the data is:
rawTable = soup.select('table.table.table-striped.table-bordered tbody')[2]
url = rawTable.find_all('a',{'class':'linkexternal'})
print(url[0])
print(url[0].get('href'))
第一行打印的结果是包含公司信息的表格标题后的第一行(在链接中您将看到它)。第二个结果是获取href字段,该字段用于包含更多信息的弹出页面,我将粘贴到这里:

javascript:NewWindow('https://contract.ibkr.info/index.php?action=Details&site=GEN&conid=48811132","详情","600","定制","正面",

实际的URL,当我手动单击它时看起来像这样:

在BeautifulSoup有没有命令可以帮我得到这个?或者我可以与BeautifulSoup结合使用的另一个Python模块,以捕获弹出窗口的URL?我不想用正则表达式得到这个

提前感谢您的帮助

print(url[0].get('href').split("'")[1])
e、 g

输出

https://contract.ibkr.info/index.php?action=Details&site=GEN&conid=48811132

在幕后,几乎每个提取文本模式的包都使用regex,我建议您使用regex:


https?:[^\s,[\]();]+

非常感谢@buran!
https://contract.ibkr.info/index.php?action=Details&site=GEN&conid=48811132