Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/88.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
识别Selenium Python中的重定向链接_Python_Html_Selenium_Url_Web Scraping - Fatal编程技术网

识别Selenium Python中的重定向链接

识别Selenium Python中的重定向链接,python,html,selenium,url,web-scraping,Python,Html,Selenium,Url,Web Scraping,我的代码目前通过网站进行解析,查看网站上的任何链接是否链接回其他用户输入的网站。但是,在某些网站上,存在间接重定向到用户输入的网站的链接 例如,如果我正在寻找麦当劳,在Yelp上,到麦当劳网站的链接是 而我的程序正在寻找www.mcdonalds.com 另一个例子是间接重定向到网站的bit.ly链接 这是我的代码供参考 ef search(web): #Clicks on the site site = web.get_attribute("href") driver.execute_scr

我的代码目前通过网站进行解析,查看网站上的任何链接是否链接回其他用户输入的网站。但是,在某些网站上,存在间接重定向到用户输入的网站的链接

例如,如果我正在寻找麦当劳,在Yelp上,到麦当劳网站的链接是

而我的程序正在寻找www.mcdonalds.com

另一个例子是间接重定向到网站的bit.ly链接

这是我的代码供参考

ef search(web):
#Clicks on the site
site = web.get_attribute("href")
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[-1])
driver.get(site)

#Wait until the webpage has loaded
try:
    start = datetime.now()
    element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "body")))

    #Gets the parsed url
    parsed_uri = urlparse(driver.current_url)
    domain = strip('{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri))
    #If it's taking too long, say it didn't load
    if(tooLong(start, datetime.now())):
        driver.execute_script("window.close('');")
        driver.switch_to.window(driver.window_handles[0])
        return site + " could not load"
    #If it's the same website, return nothing to add to the list
    if domain == URL:
        driver.execute_script("window.close('');")
        driver.switch_to.window(driver.window_handles[0])
        return ''
    #Else, search all links for the desired website
    else:
        elems = driver.find_elements_by_tag_name('a')
        for i in elems:
            newURI = urlparse(i.get_attribute("href"))
            check = strip('{uri.scheme}://{uri.netloc}/'.format(uri=newURI))
            print(check)
            #If it links back to the website, return nothing to add to the list
            if check == URL:
                driver.execute_script("window.close('');")
                driver.switch_to.window(driver.window_handles[0])
                return ''
            #If it's taking too long, say it couldn't load
            if(tooLong(start, datetime.now())):
                driver.execute_script("window.close('');")
                driver.switch_to.window(driver.window_handles[0])
                return site + " could not load"
    #If nothing is found, return the name of the website
    driver.execute_script("window.close('');")
    driver.switch_to.window(driver.window_handles[0])
    return site
#If the website doesn't load, flag it
except TimeoutException as ex:
    driver.execute_script("window.close('');")
    driver.switch_to.window(driver.window_handles[0])
    return site + " could not load"

谢谢你的帮助

那么,您现有的代码中存在什么问题呢?问题是,对于像bit.ly这样的重定向链接,它不起作用,但您的代码有点难以立即理解,
bit.ly
url也会自动重定向吗?此外,如果您正在浏览这些url,则需要浏览并查看浏览器的新url是什么