Java 如何通过提供URL来查找网站中的断开链接,例如';www.hammacher.com';
我使用下面的代码查找网站中的断开链接。但如果我想为整个网站找到包括内部链接在内的内容,我该怎么做呢?请有人给我建议。多谢各位 检查网页中断开的链接的步骤Java 如何通过提供URL来查找网站中的断开链接,例如';www.hammacher.com';,java,selenium,selenium-webdriver,webdriver,httpurlconnection,Java,Selenium,Selenium Webdriver,Webdriver,Httpurlconnection,我使用下面的代码查找网站中的断开链接。但如果我想为整个网站找到包括内部链接在内的内容,我该怎么做呢?请有人给我建议。多谢各位 检查网页中断开的链接的步骤 List<WebElement> links = driver.findElements(By.tagName("a")); Iterator<WebElement> it = links.iterator(); while(it.hasNext()){ url = it.next(
List<WebElement> links = driver.findElements(By.tagName("a"));
Iterator<WebElement> it = links.iterator();
while(it.hasNext()){
url = it.next().getAttribute("href");
System.out.println(url);
if(url == null || url.isEmpty()){
System.out.println("URL is either not configured for anchor tag or it is empty");
continue;
}
if(!url.startsWith(homePage)){
System.out.println("URL belongs to another domain, skipping it.");
continue;
}
try {
huc = (HttpURLConnection)(new URL(url).openConnection());
huc.setRequestMethod("HEAD");
huc.connect();
respCode = huc.getResponseCode();
if(respCode >= 400){
System.out.println(url+" is a broken link");
}
else{
System.out.println(url+" is a valid link");
}
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
List links=driver.findElements(按.tagName(“a”));
Iterator it=links.Iterator();
while(it.hasNext()){
url=it.next().getAttribute(“href”);
System.out.println(url);
if(url==null | | url.isEmpty()){
System.out.println(“URL不是为锚标记配置的,就是为空”);
继续;
}
如果(!url.startsWith(主页)){
println(“URL属于另一个域,正在跳过它。”);
继续;
}
试一试{
huc=(HttpURLConnection)(新URL(URL.openConnection());
huc.setRequestMethod(“HEAD”);
huc.connect();
respCode=huc.getResponseCode();
如果(respCode>=400){
System.out.println(url+“是一个断开的链接”);
}
否则{
System.out.println(url+“是有效链接”);
}
}捕获(格式错误){
//TODO自动生成的捕捉块
e、 printStackTrace();
}捕获(IOE异常){
//TODO自动生成的捕捉块
e、 printStackTrace();
}
您的方法非常完美。要在从
标记检索href
属性后检查链接的状态,您可以编写一个函数,该函数将接受href
作为参数,并按如下方式打印相关状态:
- 用于检查链接状态的函数:
private void CheckingLink(String linkURL) { try { URL url = new URL(linkURL); HttpURLConnection httpUrlConnect = (HttpURLConnection) url.openConnection(); httpUrlConnect.setConnectTimeout(5000); httpUrlConnect.connect(); if (httpUrlConnect.getResponseCode() == 200) { System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage()); } if (httpUrlConnect.getResponseCode() == 500) { System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage()); } if (httpUrlConnect.getResponseCode() == 404) { System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage()); } if (httpUrlConnect.getResponseCode() == 402) { System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage()); } if (httpUrlConnect.getResponseCode() == httpUrlConnect.HTTP_NOT_FOUND) { System.out.println( linkURL + " - " + httpUrlConnect.getResponseMessage() + " - " + httpUrlConnect.HTTP_NOT_FOUND); } } catch (IOException e) { System.out.println(e.getMessage()); } }
- 调用函数CheckingLink():
List<WebElement> elements = driver.findElements(By.tagName("a"));
System.out.println("Number of WebElements on this page : "+elements.size());
for (int i=0;i<elements.size();i++)
{
WebElement ele = elements.get(i);
String url = ele.getAttribute("href");
CheckingLink(url);
}
Number of WebElements on this page : 105
https://in.yahoo.com/ - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://in.news.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
https://in.finance.yahoo.com/ - OK
https://in.style.yahoo.com/tagged/celebrity - OK
https://in.style.yahoo.com/tagged/movies - OK
https://in.style.yahoo.com/ - OK
https://in.mobile.yahoo.com/ - OK
https://in.yahoo.com/everything/ - OK
https://in.answers.yahoo.com/ - OK
https://in.groups.yahoo.com/ - OK
https://in.messenger.yahoo.com/ - OK
https://in.news.yahoo.com/weather - OK
https://in.yahoo.com/everything/world - OK
https://in.yahoo.com/ - OK
https://login.yahoo.com/config/login?.src=fpctx&.intl=in&.lang=en-IN&.done=https%3A%2F%2Fin.yahoo.com - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://login.yahoo.com/config/login?.src=fpctx&.intl=in&.lang=en-IN&.done=https%3A%2F%2Fin.yahoo.com - OK
https://in.yahoo.com/?p=us#mega-bottombar-mail - OK
https://in.yahoo.com/?p=us#Main - OK
https://in.yahoo.com/?p=us#Aside - OK
https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK
https://cricket.yahoo.com/ - OK
https://in.news.yahoo.com/ - OK
https://in.finance.yahoo.com/ - OK
https://in.style.yahoo.com/ - OK
https://in.style.yahoo.com/tagged/movies - OK
https://in.style.yahoo.com/tagged/celebrity - OK
http://in.travelinspirations.yahoo.com/ - OK
https://in.yahoo.com/everything/ - OK
https://in.news.yahoo.com/video/32-episode-1-095405056.html - OK
https://cricket.yahoo.net/scores/india-vs-afghanistan-oneoff-test-14th-june-2018-inaf06142018185950-summary - OK
https://cricket.yahoo.net/scores/india-vs-afghanistan-oneoff-test-14th-june-2018-inaf06142018185950-summary - OK
https://in.news.yahoo.com/fed-bengaluru-traffic-techie-rides-085447032.html - OK
https://in.news.yahoo.com/photos-eid-ul-fitr-celebrations-slideshow-wp-095013253.html - OK
https://in.style.yahoo.com/quick-look-actor-plays-race-slideshow-wp-102506088.html - OK
https://in.style.yahoo.com/five-crucial-things-know-blood-103318158.html - OK
https://in.news.yahoo.com/boy-america-contracts-bubonic-plague-113108819.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://info.yahoo.com/privacy/us/yahoo/relevantads.html - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK
unknown protocol: javascript
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK
https://info.yahoo.com/privacy/us/yahoo/relevantads.html - OK
unknown protocol: javascript
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK
https://in.search.yahoo.com/search?p=India%20vs%20Afghanistan%202018&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Bajrang%20Dal%20VHP%20CIA&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Shujaat%20Bukhari&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Dhivya%20Suryadevara&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Luxury%20watches&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=FIFA%20World%20Cup%202018&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=UN%20Kashmir%20report&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=AAP%20dharna&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Sanju%20poster&fr=fp-tts&fr2=ps - OK
https://in.search.yahoo.com/search?p=Race%203&fr=fp-tts&fr2=ps - OK
https://weather.yahoo.com/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK
null
null
null
https://cricket.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
https://cricket.yahoo.com/ - OK
no protocol:
https://in.news.yahoo.com/ - OK
https://in.style.yahoo.com/bengalureans-force-bbmp-re-look-bizarre-new-pet-licensing-bye-laws-notwithoutmydog-movement-095558668.html - OK
https://in.news.yahoo.com/photos-eid-ul-fitr-celebrations-slideshow-wp-095013253.html - OK
https://in.news.yahoo.com/photos-football-frenzy-grips-russia-slideshow-wp-085232287.html - OK
https://policies.yahoo.com/in/en/yahoo/privacy/index.htm - OK
http://in.advertising.yahoo.com/ - OK
careers.yahoo.com
https://in.help.yahoo.com/kb/helpcentral - OK
https://yahoo.uservoice.com/forums/206294-india-homepage - OK
PASSED: getLinks
===============================================
Default test
Tests run: 1, Failures: 0, Skips: 0
===============================================
===============================================
Default suite
Total tests run: 1, Failures: 0, Skips: 0
===============================================