如何在java中提取html响应的特定文本
我需要从HTML响应中获取特定文本(API名称) 下面是来自服务器的HTML响应如何在java中提取html响应的特定文本,java,html,Java,Html,我需要从HTML响应中获取特定文本(API名称) 下面是来自服务器的HTML响应 <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1, s
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Ambassador Developer Portal</title>
<link
rel="stylesheet"
href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600,700,900"
type="text/css"
media="all"
>
<link rel="stylesheet" href="/docs/styles/master.css" type="text/css" media="all">
</head>
<body>
<div class="o-page">
<header class="o-page__header c-header">
<a href="/docs/" >
<img class="c-header__brand" src="/docs/assets/svg/AmbassadorType.svg" width="180px" height="18px"/>
</a>
<nav class="c-header__nav">
<ul>
<li><a href="https://www.getambassador.io">Ambassador</a></li>
<li><a href="https://www.getambassador.io/products/">Products</a></li>
<li><a href="https://blog.getambassador.io/">Blog</a></li>
</ul>
</nav>
<div class="c-header__misc">
<ul>
<form class="c-search-box">
<label>
<input type="search" placeholder="Search">
</label>
</form>
</ul>
</div>
</header>
<nav class="o-page__nav c-nav">
<div>
<strong>APIs</strong>
<ul>
<li>
<a class="" href="/docs/doc/ambassador/netbanking">
ambassador.netbanking
</a>
</li>
<li>
<a class="" href="/docs/doc/ambassador/regular-httpbin">
ambassador.regular-httpbin
</a>
</li>
<li>
<a class="" href="/docs/doc/default/petstore">
default.petstore
</a>
</li>
</ul>
</div>
<br />
<div>
<strong>Reference</strong>
<ul>
<li><a class="" href="/docs/page/Content">Content</a></li>
<li><a class="" href="/docs/page/Introduction">Introduction</a></li>
</ul>
</div>
<br />
<div>
<strong>Services without documentation</strong>
<ul>
<li>
<samp>ambassador.quote-backend</samp>
</li>
<li>
<samp>ambassador.service-a</samp>
</li>
<li>
<samp>ambassador.service-b</samp>
</li>
<li>
<samp>default.sample-app</samp>
</li>
<li>
<samp>default.sample-app-backend-route</samp>
</li>
<li>
<samp>keycloak.keycloak</samp>
</li>
</ul>
</div>
</nav>
<main class="o-page__main">
<div>
<article>
<section>
<div>
<p><span>
</span>
</p>
<h1>Welcome to the Ambassador Dev Portal</h1>
<h2>Customizing the Portal</h2>
<p>This content is fully customizable for your specific needs.
For details on customizing the portal, see <a href="https://www.getambassador.io/reference/dev-portal">https://www.getambassador.io/reference/dev-portal</a>.
</p>
<h2>Available Services</h2>
<p>The following services are exposed through this Ambassador instance:</p>
<table cellpadding="2em" width="100%">
<thead>
<tr>
<td><b>Service Name</b></td>
<td><b>Swagger URL</b></td>
</tr>
</thead>
<tbody>
<tr style="background: rgba(86,61,124,.05);">
<td>
<samp>ambassador.netbanking</samp>
</td>
<td>
<a href="/docs/doc/ambassador/netbanking"><code>API Documentation</code></a>
</td>
</tr>
<tr>
<td>
<samp>ambassador.quote-backend</samp>
</td>
<td>
<code><span style="color:red">No API Documentation</span></code>
</td>
</tr>
<tr style="background: rgba(86,61,124,.05);">
<td>
<samp>ambassador.regular-httpbin</samp>
</td>
<td>
<a href="/docs/doc/ambassador/regular-httpbin"><code>API Documentation</code></a>
</td>
</tr>
<tr>
<td>
<samp>ambassador.service-a</samp>
</td>
<td>
<code><span style="color:red">No API Documentation</span></code>
</td>
</tr>
<tr style="background: rgba(86,61,124,.05);">
<td>
<samp>ambassador.service-b</samp>
</td>
<td>
<code><span style="color:red">No API Documentation</span></code>
</td>
</tr>
<tr>
<td>
<samp>default.petstore</samp>
</td>
<td>
<a href="/docs/doc/default/petstore"><code>API Documentation</code></a>
</td>
</tr>
<tr style="background: rgba(86,61,124,.05);">
<td>
<samp>default.sample-app</samp>
</td>
<td>
<code><span style="color:red">No API Documentation</span></code>
</td>
</tr>
<tr>
<td>
<samp>default.sample-app-backend-route</samp>
</td>
<td>
<code><span style="color:red">No API Documentation</span></code>
</td>
</tr>
<tr style="background: rgba(86,61,124,.05);">
<td>
<samp>keycloak.keycloak</samp>
</td>
<td>
<code><span style="color:red">No API Documentation</span></code>
</td>
</tr>
</tbody>
</table>
</div>
</section>
</article>
</div>
</main>
<footer class="o-page__footer c-footer">
<nav>
<ul>
<li><a href="https://d6e.co/slack">Slack</a></li>
<li><a href="https://github.com/datawire/ambassador">GitHub</a></li>
<li><a href="https://www.getambassador.io/contact">Sales</a></li>
</ul>
</nav>
</footer>
</div>
</body>
</html>
从上面的回答中,我只需要从这个块/部分获取这些文本
ambassador.netbanking
ambassador.regular-httpbin
default.petstore
API
-
-
-
到目前为止,我已经尝试使用这段代码获得所需的输出
public JSONArray getApiList(){
JSONArray apiSpecList = new JSONArray();
String res = this.getApiResponse("https://gifted-wiles-4865.edgestack.me/docs/");
Document document = Jsoup.parse(res);
Elements divs = document.select("samp");
Elements divs1 = document.getElementsByClass("o-page__nav c-nav");
//Elements divs1 = document.getElementsBy("/docs/doc/");
Element link = document.select("a").first();
String test = link.text();
System.out.println("Text: " + link.text());
//res=res.substring(res.indexOf("{"),res.lastIndexOf("}") );
//System.out.println(res);
// @data = Hash.from_xml(res).to_json;
return apiSpecList;
}
public String getApiResponse(String url) {
RestTemplate restTemplate = restTemplate = new RestTemplate();
ResponseEntity<String> response;
logger.info("Ambassador , Connecting [{}] ",url);
HttpHeaders headers = new HttpHeaders();
//headers.set("Authorization", "Basic " + access_token);
headers.setContentType(MediaType.APPLICATION_JSON);
HttpEntity<String> request = new HttpEntity<String>(null, headers);
String resp = "";
try {
request = new HttpEntity<String>(null, headers);
ResponseEntity<String> result = restTemplate.exchange(url, HttpMethod.GET, request, String.class);
resp = result.getBody();
} catch (Exception err) {
logger.error("Ambassador , Error [{}] ",err.getMessage());
}
return resp;
}
publicjsonarray getApiList(){
JSONArray apiSpecList=新的JSONArray();
String res=this.getApiResponse(“https://gifted-wiles-4865.edgestack.me/docs/");
Document Document=Jsoup.parse(res);
Elements divs=document.select(“samp”);
Elements divs1=document.getElementsByClass(“o-page_uu-nav c-nav”);
//Elements divs1=document.getElementsBy(“/docs/doc/”);
元素链接=文档。选择(“a”).first();
字符串测试=link.text();
System.out.println(“Text:+link.Text());
//res=res.substring(res.indexOf(“{”)、res.lastIndexOf(“}”);
//系统输出打印项次(res);
//@data=Hash.from_xml(res.)to_json;
返回apispectlist;
}
公共字符串getApiResponse(字符串url){
RestTemplate RestTemplate=RestTemplate=new RestTemplate();
反应性反应;
logger.info(“大使,连接[{}]”,url);
HttpHeaders=新的HttpHeaders();
//headers.set(“授权”、“基本”+访问令牌);
headers.setContentType(MediaType.APPLICATION_JSON);
HttpEntity请求=新的HttpEntity(空,标题);
字符串resp=“”;
试一试{
请求=新的HttpEntity(空,标题);
ResponseEntity result=restemplate.exchange(url,HttpMethod.GET,request,String.class);
resp=result.getBody();
}捕获(异常错误){
error(“大使,错误[{}]”,err.getMessage();
}
返回响应;
}
那么如何从HTML响应中获取这些特定文本呢?我建议您使用Selenium,它更适合与网站相关的任务。你可以试试这个
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
WebDriver driver = new ChromeDriver();
driver.get("link of the website");
content = driver.findElement(By.xpath("your xpath link"));
println(content.text)
看起来这是一个很好的获取路径,第一个
div中的a
s
直接位于nav
中。似乎已经足够明确了。
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
WebDriver driver = new ChromeDriver();
driver.get("link of the website");
content = driver.findElement(By.xpath("your xpath link"));
println(content.text)