Java 如何使用HtmlUnit获得URL的最终重定向_Java_Htmlunit

Java 如何使用HtmlUnit获得URL的最终重定向

java

Java 如何使用HtmlUnit获得URL的最终重定向,java,htmlunit,Java,Htmlunit,我有网址https://www.facebook.com/ads/library/?id=286238429359299将被重定向到https://www.facebook.com/ads/library/?active_status=all&ad_type=political_and_issue_ads&country=US&impression_search_field=has_impressions_lifetime&id=286238429359299&view_all_page_id=

我有网址

https://www.facebook.com/ads/library/?id=286238429359299

将被重定向到

https://www.facebook.com/ads/library/?active_status=all&ad_type=political_and_issue_ads&country=US&impression_search_field=has_impressions_lifetime&id=286238429359299&view_all_page_id=575939395898200

在浏览器中

我正在使用以下代码：

    @Test
    public void createWebClient() throws IOException {
        getLogger("com.gargoylesoftware").setLevel(OFF);
        WebClient webClient = new WebClient(CHROME);
        WebClientOptions options = webClient.getOptions();
        options.setJavaScriptEnabled(true);
        options.setRedirectEnabled(true);
        webClient.waitForBackgroundJavaScriptStartingBefore(10000);
        // IMPORTANT: Without the country/language selection cookie the redirection does not work!
        URL s = webClient.getPage("https://www.facebook.com/ads/library/?id=286238429359299").getUrl();
    }

上面的代码没有考虑重定向，我缺少什么吗？我需要获取原始URL解析为的最终URL。

实际上，URL返回一个包含javascript的页面。javascript将检测web浏览器的环境。例如，js将检测当前浏览器是否为无头浏览器以及web驱动程序是否合法。因此，我认为解决方案是分析javascript，然后您将获得最终url。

我认为由于无头，它永远不会解析为最终url

请在浏览器中加载相同的页面，加载源代码并搜索“page_uri”，您将看到您正在查找的uri

如果要检查HtmlUnit输出或打印页面

 System.out.println(page.asXml());

您将看到“页面uri”包含最初输入的URL。

我建议使用Selenium WebDriver（非无头）

实例化您的WebDriver。

 HtmlUnitDriver driver= new HtmlUnitDriver();

或

这将确保由Javascript触发的所有重定向都由HtmlUnitDriver自动处理

>P>可以考虑子类HTMLUNITDRIVER，然后设置重定向通过调用在底层webClient中启用部件

getWebClient（）.getOptions（）.setRedirectEnabled（true）

请检查哪一个适合你

您可以阅读此SO线程以了解更多信息：

正如答案所述，“最终”URL取决于请求URL的人和方式。可能没有一个最终URL。

HtmlUnitDriver driver = new HtmlUnitDriver(true);