Java 如何通过提供URL来查找网站中的断开链接,例如';www.hammacher.com';

Java 如何通过提供URL来查找网站中的断开链接,例如';www.hammacher.com';,java,selenium,selenium-webdriver,webdriver,httpurlconnection,Java,Selenium,Selenium Webdriver,Webdriver,Httpurlconnection,我使用下面的代码查找网站中的断开链接。但如果我想为整个网站找到包括内部链接在内的内容,我该怎么做呢?请有人给我建议。多谢各位 检查网页中断开的链接的步骤 List<WebElement> links = driver.findElements(By.tagName("a")); Iterator<WebElement> it = links.iterator(); while(it.hasNext()){ url = it.next(

我使用下面的代码查找网站中的断开链接。但如果我想为整个网站找到包括内部链接在内的内容,我该怎么做呢?请有人给我建议。多谢各位

检查网页中断开的链接的步骤

List<WebElement> links = driver.findElements(By.tagName("a"));

    Iterator<WebElement> it = links.iterator();

    while(it.hasNext()){

        url = it.next().getAttribute("href");

        System.out.println(url);

        if(url == null || url.isEmpty()){
System.out.println("URL is either not configured for anchor tag or it is empty");
            continue;
        }

        if(!url.startsWith(homePage)){
            System.out.println("URL belongs to another domain, skipping it.");
            continue;
        }

        try {
            huc = (HttpURLConnection)(new URL(url).openConnection());

            huc.setRequestMethod("HEAD");

            huc.connect();

            respCode = huc.getResponseCode();

            if(respCode >= 400){
                System.out.println(url+" is a broken link");
            }
            else{
                System.out.println(url+" is a valid link");
            }

        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
List links=driver.findElements(按.tagName(“a”));
Iterator it=links.Iterator();
while(it.hasNext()){
url=it.next().getAttribute(“href”);
System.out.println(url);
if(url==null | | url.isEmpty()){
System.out.println(“URL不是为锚标记配置的,就是为空”);
继续;
}
如果(!url.startsWith(主页)){
println(“URL属于另一个域,正在跳过它。”);
继续;
}
试一试{
huc=(HttpURLConnection)(新URL(URL.openConnection());
huc.setRequestMethod(“HEAD”);
huc.connect();
respCode=huc.getResponseCode();
如果(respCode>=400){
System.out.println(url+“是一个断开的链接”);
}
否则{
System.out.println(url+“是有效链接”);
}
}捕获(格式错误){
//TODO自动生成的捕捉块
e、 printStackTrace();
}捕获(IOE异常){
//TODO自动生成的捕捉块
e、 printStackTrace();
}

您的方法非常完美。要在从
标记检索
href
属性后检查链接的状态,您可以编写一个函数,该函数将接受
href
作为参数,并按如下方式打印相关状态:

  • 用于检查链接状态的函数:

    private void CheckingLink(String linkURL) 
    {
    
            try {
                URL url = new URL(linkURL);
                HttpURLConnection httpUrlConnect = (HttpURLConnection) url.openConnection();
                httpUrlConnect.setConnectTimeout(5000);
                httpUrlConnect.connect();
                if (httpUrlConnect.getResponseCode() == 200) 
                {
                    System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage());
                }
                if (httpUrlConnect.getResponseCode() == 500) 
                {
                    System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage());
                }
                if (httpUrlConnect.getResponseCode() == 404) 
                {
                    System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage());
                }
                if (httpUrlConnect.getResponseCode() == 402) 
                {
                    System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage());
                }
                if (httpUrlConnect.getResponseCode() == httpUrlConnect.HTTP_NOT_FOUND) 
                {
                    System.out.println(
                            linkURL + " - " + httpUrlConnect.getResponseMessage() + " - " + httpUrlConnect.HTTP_NOT_FOUND);
                }
    
                } catch (IOException e) 
                    {
                        System.out.println(e.getMessage());
                    }
        } 
    
  • 调用函数CheckingLink():


你到底被困在哪里?你可以询问网站地图URL,而不是浏览网站上的所有页面。目前,代码没有获取内部URL,比如网站“www.hammacher.com”中的产品页面。产品页面是内部链接。导航将是电子标签->新到达->然后是产品…请告知我的位置在代码中进行更改
List<WebElement> elements = driver.findElements(By.tagName("a"));
System.out.println("Number of WebElements on this page : "+elements.size());
for (int i=0;i<elements.size();i++)
{
    WebElement ele = elements.get(i);
    String url = ele.getAttribute("href");
    CheckingLink(url);
}
Number of WebElements on this page : 105
https://in.yahoo.com/ - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://in.news.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
https://in.finance.yahoo.com/ - OK
https://in.style.yahoo.com/tagged/celebrity - OK
https://in.style.yahoo.com/tagged/movies - OK
https://in.style.yahoo.com/ - OK
https://in.mobile.yahoo.com/ - OK
https://in.yahoo.com/everything/ - OK
https://in.answers.yahoo.com/ - OK
https://in.groups.yahoo.com/ - OK
https://in.messenger.yahoo.com/ - OK
https://in.news.yahoo.com/weather - OK
https://in.yahoo.com/everything/world - OK
https://in.yahoo.com/ - OK
https://login.yahoo.com/config/login?.src=fpctx&.intl=in&.lang=en-IN&.done=https%3A%2F%2Fin.yahoo.com - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://login.yahoo.com/config/login?.src=fpctx&.intl=in&.lang=en-IN&.done=https%3A%2F%2Fin.yahoo.com - OK
https://in.yahoo.com/?p=us#mega-bottombar-mail - OK
https://in.yahoo.com/?p=us#Main - OK
https://in.yahoo.com/?p=us#Aside - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://cricket.yahoo.com/ - OK
https://in.news.yahoo.com/ - OK
https://in.finance.yahoo.com/ - OK
https://in.style.yahoo.com/ - OK
https://in.style.yahoo.com/tagged/movies - OK
https://in.style.yahoo.com/tagged/celebrity - OK
http://in.travelinspirations.yahoo.com/ - OK
https://in.yahoo.com/everything/ - OK
https://in.news.yahoo.com/video/32-episode-1-095405056.html - OK
https://cricket.yahoo.net/scores/india-vs-afghanistan-oneoff-test-14th-june-2018-inaf06142018185950-summary - OK
https://cricket.yahoo.net/scores/india-vs-afghanistan-oneoff-test-14th-june-2018-inaf06142018185950-summary - OK
https://in.news.yahoo.com/fed-bengaluru-traffic-techie-rides-085447032.html - OK
https://in.news.yahoo.com/photos-eid-ul-fitr-celebrations-slideshow-wp-095013253.html - OK
https://in.style.yahoo.com/quick-look-actor-plays-race-slideshow-wp-102506088.html - OK
https://in.style.yahoo.com/five-crucial-things-know-blood-103318158.html - OK
https://in.news.yahoo.com/boy-america-contracts-bubonic-plague-113108819.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://info.yahoo.com/privacy/us/yahoo/relevantads.html - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
unknown protocol: javascript
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://info.yahoo.com/privacy/us/yahoo/relevantads.html - OK
unknown protocol: javascript
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.search.yahoo.com/search?p=India%20vs%20Afghanistan%202018&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Bajrang%20Dal%20VHP%20CIA&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Shujaat%20Bukhari&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Dhivya%20Suryadevara&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Luxury%20watches&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=FIFA%20World%20Cup%202018&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=UN%20Kashmir%20report&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=AAP%20dharna&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Sanju%20poster&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Race%203&fr=fp-tts&fr2=ps - OK
https://weather.yahoo.com/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
null
null
null
https://cricket.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
no protocol: 
https://in.news.yahoo.com/ - OK
https://in.style.yahoo.com/bengalureans-force-bbmp-re-look-bizarre-new-pet-licensing-bye-laws-notwithoutmydog-movement-095558668.html - OK
https://in.news.yahoo.com/photos-eid-ul-fitr-celebrations-slideshow-wp-095013253.html - OK
https://in.news.yahoo.com/photos-football-frenzy-grips-russia-slideshow-wp-085232287.html - OK
https://policies.yahoo.com/in/en/yahoo/privacy/index.htm - OK
http://in.advertising.yahoo.com/ - OK
careers.yahoo.com
https://in.help.yahoo.com/kb/helpcentral - OK
https://yahoo.uservoice.com/forums/206294-india-homepage - OK
PASSED: getLinks

===============================================
    Default test
    Tests run: 1, Failures: 0, Skips: 0
===============================================


===============================================
Default suite
Total tests run: 1, Failures: 0, Skips: 0
===============================================