在几行java代码中读取url到字符串

在几行java代码中读取url到字符串,java,http,url,Java,Http,Url,我试图找到Java与Groovy的等价物: String content = "http://www.google.com".toURL().getText(); 我想将URL中的内容读入字符串。对于这样一个简单的任务,我不想用缓冲流和循环来污染我的代码。我查看了apache的HttpClient,但也没有看到一行或两行的实现。这个答案指的是旧版本的Java。你可能想看看克莱夫的答案 以下是执行此操作的传统方法: import java.net.*; import java.io.*; p

我试图找到Java与Groovy的等价物:

String content = "http://www.google.com".toURL().getText();

我想将URL中的内容读入字符串。对于这样一个简单的任务,我不想用缓冲流和循环来污染我的代码。我查看了apache的HttpClient,但也没有看到一行或两行的实现。

这个答案指的是旧版本的Java。你可能想看看克莱夫的答案


以下是执行此操作的传统方法:

import java.net.*;
import java.io.*;

public class URLConnectionReader {
    public static String getText(String url) throws Exception {
        URL website = new URL(url);
        URLConnection connection = website.openConnection();
        BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                    connection.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;

        while ((inputLine = in.readLine()) != null) 
            response.append(inputLine);

        in.close();

        return response.toString();
    }

    public static void main(String[] args) throws Exception {
        String content = URLConnectionReader.getText(args[0]);
        System.out.println(content);
    }
}
正如@extraneon所建议的,它允许您以一种非常雄辩的方式完成这项工作,而这种方式仍然符合Java的精神:

 InputStream in = new URL( "http://jakarta.apache.org" ).openStream();

 try {
   System.out.println( IOUtils.toString( in ) );
 } finally {
   IOUtils.closeQuietly(in);
 }

如果您有输入流(参见乔的答案),也可以考虑IOUTILS.toStand(输入流)。


)

既然原始答案被接受已经过了一段时间,有一种更好的方法:

String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();
如果您想要更完整的实现,而不是一行,请执行以下操作:

public static String readStringFromURL(String requestURL) throws IOException
{
    try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
            StandardCharsets.UTF_8.toString()))
    {
        scanner.useDelimiter("\\A");
        return scanner.hasNext() ? scanner.next() : "";
    }
}

或者只使用Apache Commons,或者也接受编码参数的变体。

使用Guava的附加示例:

URL xmlData = ...
String data = Resources.toString(xmlData, Charsets.UTF_8);

随着时间的推移,这里有一种在Java 8中实现的方法:

URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
    pageText = reader.lines().collect(Collectors.joining("\n"));
}

以下内容适用于Java7/8安全URL,并展示了如何向请求中添加cookie。请注意,这主要是cookie的直接副本,但添加了cookie示例,并说明它也可用于安全URL;-)

如果需要使用无效证书或自签名证书连接到服务器,则除非导入证书,否则将引发安全错误。如果您需要此功能,可以使用此

例子 输出

<!doctype html><html itemscope="" .... etc

对于Java 9,还有一种更好的方法:

URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
    return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}

与原始groovy示例一样,这假设内容是UTF-8编码的。(如果你需要比这更聪明的东西,你需要创建一个URLConnection并用它来计算编码。)

下面是Jeanne可爱的答案,但为像我这样的木偶提供了一个简洁的功能:

private static String getUrl(String aUrl) throws MalformedURLException, IOException
{
    String urlData = "";
    URL urlObj = new URL(aUrl);
    URLConnection conn = urlObj.openConnection();
    try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) 
    {
        urlData = reader.lines().collect(Collectors.joining("\n"));
    }
    return urlData;
}
纯Java中字符串的URL 示例呼叫

 String str = getStringFromUrl("YourUrl");
实施

您可以使用上此答案中描述的方法,并将其与上的此答案结合使用

结果会是这样

public String getStringFromUrl(URL url) throws IOException {
        return inputStreamToString(urlToInputStream(url,null));
}

public String inputStreamToString(InputStream inputStream) throws IOException {
    try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
        byte[] buffer = new byte[1024];
        int length;
        while ((length = inputStream.read(buffer)) != -1) {
            result.write(buffer, 0, length);
        }

        return result.toString(UTF_8);
    }
}

private InputStream urlToInputStream(URL url, Map<String, String> args) {
    HttpURLConnection con = null;
    InputStream inputStream = null;
    try {
        con = (HttpURLConnection) url.openConnection();
        con.setConnectTimeout(15000);
        con.setReadTimeout(15000);
        if (args != null) {
            for (Entry<String, String> e : args.entrySet()) {
                con.setRequestProperty(e.getKey(), e.getValue());
            }
        }
        con.connect();
        int responseCode = con.getResponseCode();
        /* By default the connection will follow redirects. The following
         * block is only entered if the implementation of HttpURLConnection
         * does not perform the redirect. The exact behavior depends to 
         * the actual implementation (e.g. sun.net).
         * !!! Attention: This block allows the connection to 
         * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
         * default behavior. See: https://stackoverflow.com/questions/1884230 
         * for more info!!!
         */
        if (responseCode < 400 && responseCode > 299) {
            String redirectUrl = con.getHeaderField("Location");
            try {
                URL newUrl = new URL(redirectUrl);
                return urlToInputStream(newUrl, args);
            } catch (MalformedURLException e) {
                URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                return urlToInputStream(newUrl, args);
            }
        }
        /*!!!!!*/

        inputStream = con.getInputStream();
        return inputStream;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}
公共字符串getStringFromUrl(URL URL URL)引发IOException{
返回inputStreamToString(urlToInputStream(url,null));
}
公共字符串inputStreamToString(InputStream InputStream)引发IOException{
try(ByteArrayOutputStream结果=新建ByteArrayOutputStream()){
字节[]缓冲区=新字节[1024];
整数长度;
而((长度=inputStream.read(缓冲区))!=-1){
结果.写入(缓冲区,0,长度);
}
返回结果.toString(UTF_8);
}
}
私有输入流URL到输入流(URL、映射参数){
HttpURLConnection con=null;
InputStream InputStream=null;
试一试{
con=(HttpURLConnection)url.openConnection();
con.设置连接超时(15000);
con.setReadTimeout(15000);
如果(args!=null){
对于(条目e:args.entrySet()){
con.setRequestProperty(e.getKey(),e.getValue());
}
}
con.connect();
int responseCode=con.getResponseCode();
/*默认情况下,连接将遵循重定向
*仅当HttpURLConnection的实现
*不执行重定向。具体行为取决于
*实际实现(例如sun.net)。
*!!!注意:此块允许连接到
*切换协议(例如HTTP到HTTPS),这不是
*默认行为。请参阅:https://stackoverflow.com/questions/1884230 
*更多信息!!!
*/
如果(响应代码<400&&responseCode>299){
字符串重定向URL=con.getHeaderField(“位置”);
试一试{
URL newUrl=新URL(重定向URL);
返回urlToInputStream(newUrl,args);
}捕获(格式错误){
URL newUrl=新URL(URL.getProtocol()+“:/”+URL.getHost()+重定向URL);
返回urlToInputStream(newUrl,args);
}
}
/*!!!!!*/
inputStream=con.getInputStream();
返回输入流;
}捕获(例外e){
抛出新的运行时异常(e);
}
}
专业人士

  • 它是纯java

  • 通过添加不同的头(而不是像上面的示例那样传递空对象)、身份验证等,可以很容易地增强它

  • 支持协议交换机的处理

    • Java 11+:

      URI uri = URI.create("http://www.google.com");
      HttpRequest request = HttpRequest.newBuilder(uri).build();
      String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();
      

      您可以将main方法重命名为,例如
      getText
      ,将URL字符串作为参数传递,并使用一行代码:
      string content=URLConnectionReader.getText(“http://www.yahoo.com/");字符串将不包含任何行终止字符(因为使用了BufferReader.readLine()来删除它们),因此它将不完全是URL的内容。@Benoit Guedas那么如何保持换行符呢?为什么不创建一个实用程序类来封装所有被“污染”的缓冲流和循环呢?您还可以使用该类来处理诸如在流完成之前关闭套接字之类的事情,并通过慢速连接处理I/O块。毕竟,这是面向对象的-封装功能并将其隐藏在主类中。这不能在一两行中完成。只是别忘了以后需要调用
      Scanner\close()
      。正则表达式\\A匹配输入的开头。这会告诉扫描程序标记整个流,从开始到下一个开始(不合逻辑)。整洁,但如果网页不返回任何内容(“”),则失败。您需要
      String result=scanner.hasNext()?scanner.next():“”
      来处理这个问题。@ccleve在这里添加导入是很有用的,因为这里有多个扫描程序和URLJava@ccleve你能更新链接“这解释了\\A:”?+1谢谢,这很有效
      public String getStringFromUrl(URL url) throws IOException {
              return inputStreamToString(urlToInputStream(url,null));
      }
      
      public String inputStreamToString(InputStream inputStream) throws IOException {
          try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
              byte[] buffer = new byte[1024];
              int length;
              while ((length = inputStream.read(buffer)) != -1) {
                  result.write(buffer, 0, length);
              }
      
              return result.toString(UTF_8);
          }
      }
      
      private InputStream urlToInputStream(URL url, Map<String, String> args) {
          HttpURLConnection con = null;
          InputStream inputStream = null;
          try {
              con = (HttpURLConnection) url.openConnection();
              con.setConnectTimeout(15000);
              con.setReadTimeout(15000);
              if (args != null) {
                  for (Entry<String, String> e : args.entrySet()) {
                      con.setRequestProperty(e.getKey(), e.getValue());
                  }
              }
              con.connect();
              int responseCode = con.getResponseCode();
              /* By default the connection will follow redirects. The following
               * block is only entered if the implementation of HttpURLConnection
               * does not perform the redirect. The exact behavior depends to 
               * the actual implementation (e.g. sun.net).
               * !!! Attention: This block allows the connection to 
               * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
               * default behavior. See: https://stackoverflow.com/questions/1884230 
               * for more info!!!
               */
              if (responseCode < 400 && responseCode > 299) {
                  String redirectUrl = con.getHeaderField("Location");
                  try {
                      URL newUrl = new URL(redirectUrl);
                      return urlToInputStream(newUrl, args);
                  } catch (MalformedURLException e) {
                      URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                      return urlToInputStream(newUrl, args);
                  }
              }
              /*!!!!!*/
      
              inputStream = con.getInputStream();
              return inputStream;
          } catch (Exception e) {
              throw new RuntimeException(e);
          }
      }
      
      URI uri = URI.create("http://www.google.com");
      HttpRequest request = HttpRequest.newBuilder(uri).build();
      String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();