Java 手动向网站发送GET请求。302重定向错误
我目前正在使用java开发一个web scraper。我通过设置tcp连接并使用printerwriter手动发送GET请求 我可以连接到大多数网站,如yahoo.com或crack.com,并收到回复,但我无法连接到我的目标网站-vinylengine.com。它将始终返回302错误 我将我的发送请求与浏览器的发送请求进行了比较,它们几乎完全相同 我的标题:Java 手动向网站发送GET请求。302重定向错误,java,http,redirect,get,request,Java,Http,Redirect,Get,Request,我目前正在使用java开发一个web scraper。我通过设置tcp连接并使用printerwriter手动发送GET请求 我可以连接到大多数网站,如yahoo.com或crack.com,并收到回复,但我无法连接到我的目标网站-vinylengine.com。它将始终返回302错误 我将我的发送请求与浏览器的发送请求进行了比较,它们几乎完全相同 我的标题: GET / HTTP/1.1 Host: www.vinylengine.com HTTP/1.1 302 Found Date: T
GET / HTTP/1.1
Host: www.vinylengine.com
HTTP/1.1 302 Found
Date: Thu, 06 Jun 2013 19:27:00 GMT
Server: Apache
Location: http://www.nakedresource.com/
Cache-Control: max-age=1209600
Expires: Thu, 20 Jun 2013 19:27:00 GMT
Content-Length: 213
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://www.nakedresource.com/">here</a>.</p>
</body></html>
GET http://www.vinylengine.com/ HTTP/1.1
Host: www.vinylengine.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=72407316.18415374.1370488314.1370497873.1370543389.3; __utmz=72407316.1370488314.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); SESSaf8d12283bdbdc5f5bbfb2aef054db6d=1f0676e5cab0ba2c5a80e76ea0bd6f75; __utmc=72407316; has_js=1; __utmb=72407316
Connection: keep-alive
If-Modified-Since: Thu, 06 Jun 2013 18:02:53 GMT
If-None-Match: "2186d59ac297e0f1a43433fa61e8a94b"
public void sendRequest(String extensionString, String urlString)
{
try
{
//BufferedReader inFromServer;
//PrintWriter outToServer;
//These 2 are initalized elsewhere
outToServer.println("GET " + extensionString + " HTTP/1.1");
outToServer.println("Host: " + urlString);
outToServer.println("");
outToServer.flush();
String temp;
while((temp=inFromServer.readLine()) != null)
{
System.out.println(temp);
}
return;
}
catch (Exception e)
{
System.out.printf("sendRequest failed: %s",e);
return;
}
}
我的回答:
GET / HTTP/1.1
Host: www.vinylengine.com
HTTP/1.1 302 Found
Date: Thu, 06 Jun 2013 19:27:00 GMT
Server: Apache
Location: http://www.nakedresource.com/
Cache-Control: max-age=1209600
Expires: Thu, 20 Jun 2013 19:27:00 GMT
Content-Length: 213
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://www.nakedresource.com/">here</a>.</p>
</body></html>
GET http://www.vinylengine.com/ HTTP/1.1
Host: www.vinylengine.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=72407316.18415374.1370488314.1370497873.1370543389.3; __utmz=72407316.1370488314.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); SESSaf8d12283bdbdc5f5bbfb2aef054db6d=1f0676e5cab0ba2c5a80e76ea0bd6f75; __utmc=72407316; has_js=1; __utmb=72407316
Connection: keep-alive
If-Modified-Since: Thu, 06 Jun 2013 18:02:53 GMT
If-None-Match: "2186d59ac297e0f1a43433fa61e8a94b"
public void sendRequest(String extensionString, String urlString)
{
try
{
//BufferedReader inFromServer;
//PrintWriter outToServer;
//These 2 are initalized elsewhere
outToServer.println("GET " + extensionString + " HTTP/1.1");
outToServer.println("Host: " + urlString);
outToServer.println("");
outToServer.flush();
String temp;
while((temp=inFromServer.readLine()) != null)
{
System.out.println(temp);
}
return;
}
catch (Exception e)
{
System.out.printf("sendRequest failed: %s",e);
return;
}
}
代码:
GET / HTTP/1.1
Host: www.vinylengine.com
HTTP/1.1 302 Found
Date: Thu, 06 Jun 2013 19:27:00 GMT
Server: Apache
Location: http://www.nakedresource.com/
Cache-Control: max-age=1209600
Expires: Thu, 20 Jun 2013 19:27:00 GMT
Content-Length: 213
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://www.nakedresource.com/">here</a>.</p>
</body></html>
GET http://www.vinylengine.com/ HTTP/1.1
Host: www.vinylengine.com
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=72407316.18415374.1370488314.1370497873.1370543389.3; __utmz=72407316.1370488314.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); SESSaf8d12283bdbdc5f5bbfb2aef054db6d=1f0676e5cab0ba2c5a80e76ea0bd6f75; __utmc=72407316; has_js=1; __utmb=72407316
Connection: keep-alive
If-Modified-Since: Thu, 06 Jun 2013 18:02:53 GMT
If-None-Match: "2186d59ac297e0f1a43433fa61e8a94b"
public void sendRequest(String extensionString, String urlString)
{
try
{
//BufferedReader inFromServer;
//PrintWriter outToServer;
//These 2 are initalized elsewhere
outToServer.println("GET " + extensionString + " HTTP/1.1");
outToServer.println("Host: " + urlString);
outToServer.println("");
outToServer.flush();
String temp;
while((temp=inFromServer.readLine()) != null)
{
System.out.println(temp);
}
return;
}
catch (Exception e)
{
System.out.printf("sendRequest failed: %s",e);
return;
}
}
我已尝试将主机名更改为nakedresource.com,但当我这样做时,我得到的是nakedresource.com的页面源代码,而不是vinylengine.com当您在浏览器中配置了代理,但JVM不知道时,这是可能的 尝试使用以下参数启动JVM,看看它是否解决了问题:
-Dhttp.proxyHost=10.12.11.1 -Dhttp.proxyPort=8800
如果您正在使用上述代码
另请参阅相关网站正在查看您的用户代理字符串(或者在您的案例中缺少该字符串) 当你说你正在做与浏览器“几乎相同的事情”时。。。你说得对。电脑对这类事情有点挑剔 如果不提供
用户代理:
标题,则会得到重定向
>telnet www.vinylengine.com 80正在尝试67.225.154.112…
已连接到vinylengine.com。
转义字符为“^]”。
获取/HTTP/1.1
主持人:www.vinylengine.com
接受:*/*
找到HTTP/1.1 302
然而,如果您提供了一个,您将获得以下页面: >telnet www.vinylengine.com 80
正在尝试67.225.154.112…
已连接到vinylengine.com。
转义字符为“^]”。
获取/HTTP/1.1
主持人:www.vinylengine.com
用户代理:curl/7.21.4(universal-apple-darwin11.0)libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
接受:*/*
HTTP/1.1200正常
... (页面)
这通常是因为站点根据
用户代理
标题向不同的浏览器提供不同版本的内容。显然,他们对“无用户代理”的回答是。。。下注,您将被重定向到父站点根。HTTP 302意味着重定向,因此您需要再次调用重定向的URL。因此,您没有做与浏览器相同的事情,然后想知道为什么结果不同?问题不是这样,问题是站点查找用户代理字符串。仅供参考,重定向是因为没有用户代理字符串。然后,它会将您带到另一个(父)站点,而不是所讨论的实际页面。您可以随时参考以了解如何添加用户代理字符串。+1正是我的发现(甚至是相同的zLib版本:)。感谢您解决了此问题!