Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/309.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java scrape网站,需要使用Jsoup登录_Java_Authentication_Jsoup_Scrape - Fatal编程技术网

Java scrape网站,需要使用Jsoup登录

Java scrape网站,需要使用Jsoup登录,java,authentication,jsoup,scrape,Java,Authentication,Jsoup,Scrape,我想从streetinsider.com打印一些数据(div with class=“news_article”)。我创建了一个帐户,需要登录才能访问这些数据 有人能解释一下为什么这个代码不起作用吗?我试了很多,但都没用 public static final String SPLIT_INTERNET_URL = "http://www.streetinsider.com/Special+Dividends?offset=55"; public static final String

我想从streetinsider.com打印一些数据(div with class=“news_article”)。我创建了一个帐户,需要登录才能访问这些数据

有人能解释一下为什么这个代码不起作用吗?我试了很多,但都没用

    public static final String SPLIT_INTERNET_URL = "http://www.streetinsider.com/Special+Dividends?offset=55";
public static final String SPLIT_LOGIN = "https://www.streetinsider.com/login.php";

/**
 * @param args the command line arguments
 * @throws java.io.FileNotFoundException
 * @throws java.io.UnsupportedEncodingException
 * @throws java.text.ParseException
 * @throws java.lang.ClassNotFoundException
 */
public static void main(String[] args) throws FileNotFoundException, UnsupportedEncodingException, IOException, ParseException, ClassNotFoundException {
    // TODO code application logic here
    Response res = Jsoup.connect(SPLIT_LOGIN)
            .data("loginemail", "XXXXX", "password", "XXXX")
            .method(Method.POST)
            .execute();
    Document doc = res.parse();

    Map<String, String> cookies = res.cookies();

    Document pageWhenAlreadyLoggedIn = Jsoup.connect(SPLIT_INTERNET_URL).cookies(cookies).get();
    Elements elems = pageWhenAlreadyLoggedIn.select("div[class=news_article]");
    for (Element elem : elems) {
        System.out.println(elem);
    }
}
公共静态最终字符串拆分\u INTERNET\u URL=”http://www.streetinsider.com/Special+股息?抵销=55”;
公共静态最终字符串拆分\u登录=”https://www.streetinsider.com/login.php";
/**
*@param指定命令行参数
*@throws java.io.FileNotFoundException
*@throws java.io.UnsupportedEncodingException
*@throws java.text.ParseException
*@throws java.lang.ClassNotFoundException
*/
publicstaticvoidmain(字符串[]args)抛出FileNotFoundException、UnsupportedEncodingException、IOException、ParseException、ClassNotFoundException{
//此处的TODO代码应用程序逻辑
Response res=Jsoup.connect(SPLIT\u登录)
.数据(“loginemail”、“XXXXX”、“密码”、“XXXX”)
.method(method.POST)
.execute();
Document doc=res.parse();
映射cookies=res.cookies();
Document pageWhenAlreadyLoggedIn=Jsoup.connect(拆分互联网URL).cookies(cookies.get();
Elements elems=pageWhenAlreadyLoggedIn.select(“div[class=news\u article]”);
用于(元素元素:元素){
系统输出打印项次(elem);
}
}

您的代码无法让您登录网站……请尝试以下代码登录网站

要登录网站:

Connection.Response res = Jsoup.connect(SPLIT_LOGIN)
            .data("action", "account", 
                "redirect", "account_home.php?",
                "radiobutton", "old", 
                "loginemail", "XXXXX",
                "password", "XXXXX", 
                "LoginChoice", "Sign In to Secure Area")
            .method(Connection.Method.POST)
            .followRedirects(true)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
            .execute();
因此,您现在已登录,但网站似乎会检测到您是否已登录其他浏览器或连接,并请求您先终止该连接。下面是终止连接的代码:

Connection.Response res2 = Jsoup.connect("http://www.streetinsider.com/login_duplicate.php")
            .data("ok", "End Prior Session")
            .method(Connection.Method.POST)
            .cookies(res.cookies())
            .followRedirects(true)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36")
            .execute();
很好,现在
res2
将包含您帐户的主页,然后您可以继续转到您想要的任何页面。有关如何使用
Jsoup
登录网站的更多信息,请参阅以下教程:


可以肯定,这是假定的HTTP基本身份验证,而这不是网站所需要的。你必须得到一个会话令牌并欺骗会话。天哪,非常感谢@Joel-Min,它正在工作,我明白为什么!这正是我想要的,你救了我一天!不用担心,兄弟,很高兴它起到了作用:)迫不及待地想帮助像你这样的人。祝您愉快,先生:)