Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/85.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在java中提取html响应的特定文本_Java_Html - Fatal编程技术网

如何在java中提取html响应的特定文本

如何在java中提取html响应的特定文本,java,html,Java,Html,我需要从HTML响应中获取特定文本(API名称) 下面是来自服务器的HTML响应 <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1, s

我需要从HTML响应中获取特定文本(API名称)

下面是来自服务器的HTML响应

<!DOCTYPE html>
<html lang="en">
   <head>
      <meta charset="UTF-8">
      <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
      <title>Ambassador Developer Portal</title>
      <link
         rel="stylesheet"
         href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600,700,900"
         type="text/css"
         media="all"
         >
      <link rel="stylesheet" href="/docs/styles/master.css" type="text/css" media="all">
   </head>
   <body>
      <div class="o-page">
         <header class="o-page__header c-header">
            <a href="/docs/" >
            <img class="c-header__brand" src="/docs/assets/svg/AmbassadorType.svg" width="180px" height="18px"/>
            </a>
            <nav class="c-header__nav">
               <ul>
                  <li><a href="https://www.getambassador.io">Ambassador</a></li>
                  <li><a href="https://www.getambassador.io/products/">Products</a></li>
                  <li><a href="https://blog.getambassador.io/">Blog</a></li>
               </ul>
            </nav>
            <div class="c-header__misc">
               <ul>
                  <form class="c-search-box">
                     <label>
                     <input type="search" placeholder="Search">
                     </label>
                  </form>
               </ul>
            </div>
         </header>
         <nav class="o-page__nav c-nav">
            <div>
               <strong>APIs</strong>
               <ul>
                  <li>
                     <a class="" href="/docs/doc/ambassador/netbanking">
                     ambassador.netbanking
                     </a>
                  </li>
                  <li>
                     <a class="" href="/docs/doc/ambassador/regular-httpbin">
                     ambassador.regular-httpbin
                     </a>
                  </li>
                  <li>
                     <a class="" href="/docs/doc/default/petstore">
                     default.petstore
                     </a>
                  </li>
               </ul>
            </div>
            <br />
            <div>
               <strong>Reference</strong>
               <ul>
                  <li><a class="" href="/docs/page/Content">Content</a></li>
                  <li><a class="" href="/docs/page/Introduction">Introduction</a></li>
               </ul>
            </div>
            <br />
            <div>
               <strong>Services without documentation</strong>
               <ul>
                  <li>
                     <samp>ambassador.quote-backend</samp>
                  </li>
                  <li>
                     <samp>ambassador.service-a</samp>
                  </li>
                  <li>
                     <samp>ambassador.service-b</samp>
                  </li>
                  <li>
                     <samp>default.sample-app</samp>
                  </li>
                  <li>
                     <samp>default.sample-app-backend-route</samp>
                  </li>
                  <li>
                     <samp>keycloak.keycloak</samp>
                  </li>
               </ul>
            </div>
         </nav>
         <main class="o-page__main">
            <div>
               <article>
                  <section>
                     <div>
                        <p><span>
                           </span>
                        </p>
                        <h1>Welcome to the Ambassador Dev Portal</h1>
                        <h2>Customizing the Portal</h2>
                        <p>This content is fully customizable for your specific needs.
                           For details on customizing the portal, see <a href="https://www.getambassador.io/reference/dev-portal">https://www.getambassador.io/reference/dev-portal</a>.
                        </p>
                        <h2>Available Services</h2>
                        <p>The following services are exposed through this Ambassador instance:</p>
                        <table cellpadding="2em" width="100%">
                           <thead>
                              <tr>
                                 <td><b>Service Name</b></td>
                                 <td><b>Swagger URL</b></td>
                              </tr>
                           </thead>
                           <tbody>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>ambassador.netbanking</samp>
                                 </td>
                                 <td>
                                    <a href="/docs/doc/ambassador/netbanking"><code>API Documentation</code></a>
                                 </td>
                              </tr>
                              <tr>
                                 <td>
                                    <samp>ambassador.quote-backend</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>ambassador.regular-httpbin</samp>
                                 </td>
                                 <td>
                                    <a href="/docs/doc/ambassador/regular-httpbin"><code>API Documentation</code></a>
                                 </td>
                              </tr>
                              <tr>
                                 <td>
                                    <samp>ambassador.service-a</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>ambassador.service-b</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr>
                                 <td>
                                    <samp>default.petstore</samp>
                                 </td>
                                 <td>
                                    <a href="/docs/doc/default/petstore"><code>API Documentation</code></a>
                                 </td>
                              </tr>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>default.sample-app</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr>
                                 <td>
                                    <samp>default.sample-app-backend-route</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                              <tr style="background: rgba(86,61,124,.05);">
                                 <td>
                                    <samp>keycloak.keycloak</samp>
                                 </td>
                                 <td>
                                    <code><span style="color:red">No API Documentation</span></code>
                                 </td>
                              </tr>
                           </tbody>
                        </table>
                     </div>
                  </section>
               </article>
            </div>
         </main>
         <footer class="o-page__footer c-footer">
            <nav>
               <ul>
                  <li><a href="https://d6e.co/slack">Slack</a></li>
                  <li><a href="https://github.com/datawire/ambassador">GitHub</a></li>
                  <li><a href="https://www.getambassador.io/contact">Sales</a></li>
               </ul>
            </nav>
         </footer>
      </div>
   </body>
</html>
从上面的回答中,我只需要从这个块/部分获取这些文本

ambassador.netbanking
ambassador.regular-httpbin
default.petstore


API
到目前为止,我已经尝试使用这段代码获得所需的输出

public JSONArray getApiList(){
        JSONArray apiSpecList = new JSONArray();
        String res = this.getApiResponse("https://gifted-wiles-4865.edgestack.me/docs/");
        Document document = Jsoup.parse(res);
        Elements divs = document.select("samp");
        Elements divs1 = document.getElementsByClass("o-page__nav c-nav");
        //Elements divs1 = document.getElementsBy("/docs/doc/");
        Element link = document.select("a").first();
        String test = link.text();

        System.out.println("Text: " + link.text());
        //res=res.substring(res.indexOf("{"),res.lastIndexOf("}") );
        //System.out.println(res);
       // @data = Hash.from_xml(res).to_json;
        return apiSpecList;
    }

public String getApiResponse(String url) {
        RestTemplate restTemplate = restTemplate = new RestTemplate();
        ResponseEntity<String> response;
        logger.info("Ambassador , Connecting [{}] ",url);
        HttpHeaders headers = new HttpHeaders();
        //headers.set("Authorization", "Basic " + access_token);
        headers.setContentType(MediaType.APPLICATION_JSON);

        HttpEntity<String> request = new HttpEntity<String>(null, headers);
        String resp = "";
        try {
            request = new HttpEntity<String>(null, headers);
            ResponseEntity<String> result = restTemplate.exchange(url, HttpMethod.GET, request, String.class);
            resp = result.getBody();
        }  catch (Exception err) {
            logger.error("Ambassador , Error [{}] ",err.getMessage());
        }
        return resp;
    }
publicjsonarray getApiList(){
JSONArray apiSpecList=新的JSONArray();
String res=this.getApiResponse(“https://gifted-wiles-4865.edgestack.me/docs/");
Document Document=Jsoup.parse(res);
Elements divs=document.select(“samp”);
Elements divs1=document.getElementsByClass(“o-page_uu-nav c-nav”);
//Elements divs1=document.getElementsBy(“/docs/doc/”);
元素链接=文档。选择(“a”).first();
字符串测试=link.text();
System.out.println(“Text:+link.Text());
//res=res.substring(res.indexOf(“{”)、res.lastIndexOf(“}”);
//系统输出打印项次(res);
//@data=Hash.from_xml(res.)to_json;
返回apispectlist;
}
公共字符串getApiResponse(字符串url){
RestTemplate RestTemplate=RestTemplate=new RestTemplate();
反应性反应;
logger.info(“大使,连接[{}]”,url);
HttpHeaders=新的HttpHeaders();
//headers.set(“授权”、“基本”+访问令牌);
headers.setContentType(MediaType.APPLICATION_JSON);
HttpEntity请求=新的HttpEntity(空,标题);
字符串resp=“”;
试一试{
请求=新的HttpEntity(空,标题);
ResponseEntity result=restemplate.exchange(url,HttpMethod.GET,request,String.class);
resp=result.getBody();
}捕获(异常错误){
error(“大使,错误[{}]”,err.getMessage();
}
返回响应;
}

那么如何从HTML响应中获取这些特定文本呢?

我建议您使用Selenium,它更适合与网站相关的任务。你可以试试这个

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

WebDriver driver = new ChromeDriver();
driver.get("link of the website");

content = driver.findElement(By.xpath("your xpath link"));
println(content.text) 

看起来这是一个很好的获取路径,第一个
div中的
a
s
直接位于
nav
中。似乎已经足够明确了。
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

WebDriver driver = new ChromeDriver();
driver.get("link of the website");

content = driver.findElement(By.xpath("your xpath link"));
println(content.text)