在几行java代码中读取url到字符串
我试图找到Java与Groovy的等价物:在几行java代码中读取url到字符串,java,http,url,Java,Http,Url,我试图找到Java与Groovy的等价物: String content = "http://www.google.com".toURL().getText(); 我想将URL中的内容读入字符串。对于这样一个简单的任务,我不想用缓冲流和循环来污染我的代码。我查看了apache的HttpClient,但也没有看到一行或两行的实现。这个答案指的是旧版本的Java。你可能想看看克莱夫的答案 以下是执行此操作的传统方法: import java.net.*; import java.io.*; p
String content = "http://www.google.com".toURL().getText();
我想将URL中的内容读入字符串。对于这样一个简单的任务,我不想用缓冲流和循环来污染我的代码。我查看了apache的HttpClient,但也没有看到一行或两行的实现。这个答案指的是旧版本的Java。你可能想看看克莱夫的答案
以下是执行此操作的传统方法:
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static String getText(String url) throws Exception {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
in.close();
return response.toString();
}
public static void main(String[] args) throws Exception {
String content = URLConnectionReader.getText(args[0]);
System.out.println(content);
}
}
正如@extraneon所建议的,它允许您以一种非常雄辩的方式完成这项工作,而这种方式仍然符合Java的精神:
InputStream in = new URL( "http://jakarta.apache.org" ).openStream();
try {
System.out.println( IOUtils.toString( in ) );
} finally {
IOUtils.closeQuietly(in);
}
如果您有输入流(参见乔的答案),也可以考虑IOUTILS.toStand(输入流)。
)既然原始答案被接受已经过了一段时间,有一种更好的方法:
String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();
如果您想要更完整的实现,而不是一行,请执行以下操作:
public static String readStringFromURL(String requestURL) throws IOException
{
try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
StandardCharsets.UTF_8.toString()))
{
scanner.useDelimiter("\\A");
return scanner.hasNext() ? scanner.next() : "";
}
}
或者只使用Apache Commons,或者也接受编码参数的变体。使用Guava的附加示例:
URL xmlData = ...
String data = Resources.toString(xmlData, Charsets.UTF_8);
随着时间的推移,这里有一种在Java 8中实现的方法:
URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
pageText = reader.lines().collect(Collectors.joining("\n"));
}
以下内容适用于Java7/8安全URL,并展示了如何向请求中添加cookie。请注意,这主要是cookie的直接副本,但添加了cookie示例,并说明它也可用于安全URL;-) 如果需要使用无效证书或自签名证书连接到服务器,则除非导入证书,否则将引发安全错误。如果您需要此功能,可以使用此 例子 输出
<!doctype html><html itemscope="" .... etc
对于Java 9,还有一种更好的方法:
URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}
与原始groovy示例一样,这假设内容是UTF-8编码的。(如果你需要比这更聪明的东西,你需要创建一个URLConnection并用它来计算编码。)下面是Jeanne可爱的答案,但为像我这样的木偶提供了一个简洁的功能:
private static String getUrl(String aUrl) throws MalformedURLException, IOException
{
String urlData = "";
URL urlObj = new URL(aUrl);
URLConnection conn = urlObj.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8)))
{
urlData = reader.lines().collect(Collectors.joining("\n"));
}
return urlData;
}
纯Java中字符串的URL
示例呼叫
String str = getStringFromUrl("YourUrl");
实施
您可以使用上此答案中描述的方法,并将其与上的此答案结合使用
结果会是这样
public String getStringFromUrl(URL url) throws IOException {
return inputStreamToString(urlToInputStream(url,null));
}
public String inputStreamToString(InputStream inputStream) throws IOException {
try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
byte[] buffer = new byte[1024];
int length;
while ((length = inputStream.read(buffer)) != -1) {
result.write(buffer, 0, length);
}
return result.toString(UTF_8);
}
}
private InputStream urlToInputStream(URL url, Map<String, String> args) {
HttpURLConnection con = null;
InputStream inputStream = null;
try {
con = (HttpURLConnection) url.openConnection();
con.setConnectTimeout(15000);
con.setReadTimeout(15000);
if (args != null) {
for (Entry<String, String> e : args.entrySet()) {
con.setRequestProperty(e.getKey(), e.getValue());
}
}
con.connect();
int responseCode = con.getResponseCode();
/* By default the connection will follow redirects. The following
* block is only entered if the implementation of HttpURLConnection
* does not perform the redirect. The exact behavior depends to
* the actual implementation (e.g. sun.net).
* !!! Attention: This block allows the connection to
* switch protocols (e.g. HTTP to HTTPS), which is <b>not</b>
* default behavior. See: https://stackoverflow.com/questions/1884230
* for more info!!!
*/
if (responseCode < 400 && responseCode > 299) {
String redirectUrl = con.getHeaderField("Location");
try {
URL newUrl = new URL(redirectUrl);
return urlToInputStream(newUrl, args);
} catch (MalformedURLException e) {
URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
return urlToInputStream(newUrl, args);
}
}
/*!!!!!*/
inputStream = con.getInputStream();
return inputStream;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
公共字符串getStringFromUrl(URL URL URL)引发IOException{
返回inputStreamToString(urlToInputStream(url,null));
}
公共字符串inputStreamToString(InputStream InputStream)引发IOException{
try(ByteArrayOutputStream结果=新建ByteArrayOutputStream()){
字节[]缓冲区=新字节[1024];
整数长度;
而((长度=inputStream.read(缓冲区))!=-1){
结果.写入(缓冲区,0,长度);
}
返回结果.toString(UTF_8);
}
}
私有输入流URL到输入流(URL、映射参数){
HttpURLConnection con=null;
InputStream InputStream=null;
试一试{
con=(HttpURLConnection)url.openConnection();
con.设置连接超时(15000);
con.setReadTimeout(15000);
如果(args!=null){
对于(条目e:args.entrySet()){
con.setRequestProperty(e.getKey(),e.getValue());
}
}
con.connect();
int responseCode=con.getResponseCode();
/*默认情况下,连接将遵循重定向
*仅当HttpURLConnection的实现
*不执行重定向。具体行为取决于
*实际实现(例如sun.net)。
*!!!注意:此块允许连接到
*切换协议(例如HTTP到HTTPS),这不是
*默认行为。请参阅:https://stackoverflow.com/questions/1884230
*更多信息!!!
*/
如果(响应代码<400&&responseCode>299){
字符串重定向URL=con.getHeaderField(“位置”);
试一试{
URL newUrl=新URL(重定向URL);
返回urlToInputStream(newUrl,args);
}捕获(格式错误){
URL newUrl=新URL(URL.getProtocol()+“:/”+URL.getHost()+重定向URL);
返回urlToInputStream(newUrl,args);
}
}
/*!!!!!*/
inputStream=con.getInputStream();
返回输入流;
}捕获(例外e){
抛出新的运行时异常(e);
}
}
专业人士
- 它是纯java
- 通过添加不同的头(而不是像上面的示例那样传递空对象)、身份验证等,可以很容易地增强它
- 支持协议交换机的处理
- Java 11+:
URI uri = URI.create("http://www.google.com");
HttpRequest request = HttpRequest.newBuilder(uri).build();
String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();
您可以将main方法重命名为,例如
getText
,将URL字符串作为参数传递,并使用一行代码:string content=URLConnectionReader.getText(“http://www.yahoo.com/");代码>字符串将不包含任何行终止字符(因为使用了BufferReader.readLine()来删除它们),因此它将不完全是URL的内容。@Benoit Guedas那么如何保持换行符呢?为什么不创建一个实用程序类来封装所有被“污染”的缓冲流和循环呢?您还可以使用该类来处理诸如在流完成之前关闭套接字之类的事情,并通过慢速连接处理I/O块。毕竟,这是面向对象的-封装功能并将其隐藏在主类中。这不能在一两行中完成。只是别忘了以后需要调用Scanner\close()
。正则表达式\\A匹配输入的开头。这会告诉扫描程序标记整个流,从开始到下一个开始(不合逻辑)。整洁,但如果网页不返回任何内容(“”),则失败。您需要String result=scanner.hasNext()?scanner.next():“”
来处理这个问题。@ccleve在这里添加导入是很有用的,因为这里有多个扫描程序和URLJava@ccleve你能更新链接“这解释了\\A:”?+1谢谢,这很有效
public String getStringFromUrl(URL url) throws IOException {
return inputStreamToString(urlToInputStream(url,null));
}
public String inputStreamToString(InputStream inputStream) throws IOException {
try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
byte[] buffer = new byte[1024];
int length;
while ((length = inputStream.read(buffer)) != -1) {
result.write(buffer, 0, length);
}
return result.toString(UTF_8);
}
}
private InputStream urlToInputStream(URL url, Map<String, String> args) {
HttpURLConnection con = null;
InputStream inputStream = null;
try {
con = (HttpURLConnection) url.openConnection();
con.setConnectTimeout(15000);
con.setReadTimeout(15000);
if (args != null) {
for (Entry<String, String> e : args.entrySet()) {
con.setRequestProperty(e.getKey(), e.getValue());
}
}
con.connect();
int responseCode = con.getResponseCode();
/* By default the connection will follow redirects. The following
* block is only entered if the implementation of HttpURLConnection
* does not perform the redirect. The exact behavior depends to
* the actual implementation (e.g. sun.net).
* !!! Attention: This block allows the connection to
* switch protocols (e.g. HTTP to HTTPS), which is <b>not</b>
* default behavior. See: https://stackoverflow.com/questions/1884230
* for more info!!!
*/
if (responseCode < 400 && responseCode > 299) {
String redirectUrl = con.getHeaderField("Location");
try {
URL newUrl = new URL(redirectUrl);
return urlToInputStream(newUrl, args);
} catch (MalformedURLException e) {
URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
return urlToInputStream(newUrl, args);
}
}
/*!!!!!*/
inputStream = con.getInputStream();
return inputStream;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
URI uri = URI.create("http://www.google.com");
HttpRequest request = HttpRequest.newBuilder(uri).build();
String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();