如何在java中使用jersey客户端获取网页的页面源

如何在java中使用jersey客户端获取网页的页面源,java,jersey-client,Java,Jersey Client,在使用java的web应用程序中,我试图通过传递所需页面的URL来使用jersey客户端获取web页面的页面源。我一直在网上搜索一些对我有帮助的好例子,但是没有找到。 有人能帮我吗 如果您只是想下载html代码(而不是呈现它),您可以使用任何普通http客户端(甚至java URLConnection类) 下面是我在一个工具中使用的一个现成示例。它使用apache http核心4.1.4和apache http客户端4.1.4 import java.io.BufferedReader; imp

在使用java的web应用程序中,我试图通过传递所需页面的URL来使用jersey客户端获取web页面的页面源。我一直在网上搜索一些对我有帮助的好例子,但是没有找到。
有人能帮我吗

如果您只是想下载html代码(而不是呈现它),您可以使用任何普通http客户端(甚至java URLConnection类)

下面是我在一个工具中使用的一个现成示例。它使用apache http核心4.1.4和apache http客户端4.1.4

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;

public class HttpUtil {

    public static String getFile(String sUrl) throws ClientProtocolException, IOException{
        HttpClient httpclient = new DefaultHttpClient();
        StringBuilder b = new StringBuilder();

         // Prepare a request object
         HttpGet httpget = new HttpGet(sUrl);

         // Execute the request
         HttpResponse response = httpclient.execute(httpget);

         // Examine the response status
         System.out.println(response.getStatusLine());

         // Get hold of the response entity
         HttpEntity entity = response.getEntity();

         // If the response does not enclose an entity, there is no need
         // to worry about connection release
         if (entity != null) {
             InputStream instream = entity.getContent();
             try {

                 BufferedReader reader = new BufferedReader(
                         new InputStreamReader(instream));
                 // do something useful with the response
                 String s = reader.readLine();
                 while(s!= null){
                     b.append(s);
                     b.append("\n");
                     s = reader.readLine();
                 }
             } catch (IOException ex) {

                 // In case of an IOException the connection will be released
                 // back to the connection manager automatically
                 throw ex;

             } catch (RuntimeException ex) {

                 // In case of an unexpected exception you may want to abort
                 // the HTTP request in order to shut down the underlying
                 // connection and release it back to the connection manager.
                 httpget.abort();
                 throw ex;

             } finally {

                 // Closing the input stream will trigger connection release
                 instream.close();

             }

             // When HttpClient instance is no longer needed,
             // shut down the connection manager to ensure
             // immediate deallocation of all system resources
             httpclient.getConnectionManager().shutdown();
         }
         return b.toString();
    }

}

Jersey是用于web服务的。但一般来说,您可以获得HTML源代码。 所有这4种jax rs客户机都将向您打印代码:

  • URLConnection客户端

    导入java.io.BufferedReader; 导入java.io.IOException; 导入java.io.InputStreamReader; 导入java.net.URL; 导入java.net.URLConnection; 公共类URLConnectionClient{ 公共静态void main(字符串[]args)引发IOException{ URL restURL=新URL(“”); URLConnection connection=(URLConnection)restURL.openConnection(); connection.setDoOutput(真); connection.connect(); InputStreamReader ins=新的InputStreamReader(connection.getInputStream()); BufferedReader in=新的BufferedReader(ins); 字符串输入线; 而((inputLine=in.readLine())!=null){ 系统输出打印LN(输入线); } in.close(); } }

  • HttpConnection客户端

    导入java.io.BufferedReader; 导入java.io.IOException; 导入java.io.InputStreamReader; 导入java.net.HttpURLConnection; 导入java.net.URL; 公共类HttpConnectionClient{ 公共静态void main(字符串[]args)引发IOException{ URL restURL=新URL(“”); HttpURLConnection连接=(HttpURLConnection)restURL.openConnection(); connection.setRequestMethod(“GET”); connection.setReadTimeout(10000); connection.connect(); InputStreamReader ins=新的InputStreamReader(connection.getInputStream()); BufferedReader in=新的BufferedReader(ins); 字符串输入线; 而((inputLine=in.readLine())!=null){ 系统输出打印LN(输入线); }

    } }

  • URL流客户端

    导入java.io.BufferedReader; 导入java.io.IOException; 导入java.io.InputStreamReader; 导入java.net.URL; 公共类URLOpenClient{

    公共静态void main(字符串[]args)引发IOException{ URL restURL=新URL(“”); InputStreamReader ins=新的InputStreamReader(restURL.openStream()); BufferedReader in=新的BufferedReader(ins); 字符串输入线; 而((inputLine=in.readLine())!=null){ 系统输出打印LN(输入线); } in.close(); } }

  • 泽西岛客户

    导入javax.ws.rs.client.client; 导入javax.ws.rs.client.ClientBuilder; 导入javax.ws.rs.client.Invocation.Builder; 导入javax.ws.rs.client.WebTarget; 导入javax.ws.rs.core.Response; 公共类客户机{ 公共静态void main(字符串[]args){ Client cl=ClientBuilder.newClient(); WebTarget target=cl.target(“”); 目标路径(“资源”); Builder requestBuilder=target.request(); Response=requestBuilder.get(); System.out.println(response.getStatus()); System.out.println(response.readEntity(String.class)); } }

  • 对于这一个,您将需要一个依赖项:

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.vvirlan</groupId>
    <artifactId>cert</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>Client</name>
    
    <dependencies>
    <dependency>
    <groupId>org.glassfish.jersey.core</groupId>
    <artifactId>jersey-client</artifactId>
    <version>2.21</version>
    </dependency>
    </dependencies>
    </project>
    
    
    4.0.0
    com.vvirlan
    

    jersey适用于web服务。但是一般来说,你可以得到HTML源代码。谢谢,它成功了。但是当我有一个像“”这样的代码时,它返回为“�

    "。是否有解决方案谢谢您提供的详细信息,所有这些都有效,但最后一个“response.readEntity(String.class)”为我显示了错误,请您更新它以返回包含整个页面源的字符串。我还有一个问题,一些符号,如“nbsp”、“-”、单引号等显示为“?”…您有什么解决方案吗…提前感谢这些奇怪的符号是由于UTF/其他编码问题造成的。搜索如何对字符串进行编码