Java 使用Jsoup刮取网页_Java_Html_Parsing_Web Scraping_Jsoup

Java 使用Jsoup刮取网页

java html parsing web-scraping

Java 使用Jsoup刮取网页,java,html,parsing,web-scraping,jsoup,Java,Html,Parsing,Web Scraping,Jsoup,我需要使用Jsoup从下面的HTML中提取邮政编码。我只需要邮政编码，它是标签的href属性的一部分。在此示例中，邮政编码部分为W2： <a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3

我需要使用Jsoup从下面的HTML中提取邮政编码。我只需要邮政编码，它是

标签的href
属性的一部分。在此示例中，邮政编码部分为W2
：
<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>



这是HTML：
</div>

<div id="property_1062067" class="property_summary">

<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>



有人能帮忙吗？
谢谢。
您可以使用JSOUP，只需检索href属性值，如下所示：
Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Elements elements = document.select("a");

String href = elements.attr("href");

现在，href属性是一个字符串，您需要应用正则表达式（RegEx）来获取所需字段，在本例中为“/properties for sale/w2/chpk3848653”中包含的邮政编码。为此，您需要：
String regex = "[a-zA-Z0-9]{11}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);

String postalCode = matcher.find().group(0);

就这些，如果你还需要什么，尽管问吧！希望这对你有帮助
 你说“我只需要W2的邮政编码”是什么意思？另外，你可以发布一些你尝试过的东西吗？我只是想显示我到底想要废弃哪些数据。请参见下面的>Bayswater，W2这是我的代码，我试图从中提取元素postcodes=doc.select（“span.property\u address\u location\u name”）；对于（元素邮政编码：postcodes）{System.out.println（postcode.text（））；}此代码有问题。谢谢你anyway@Hakan没问题！如果您还需要什么，请问我，这只是一个示例代码作为指南+1如果你觉得有用的话！这是我刮取所有其他属性的代码……等等。//获取属性元素的位置locations=items.Get（I）.getElementsByTag（“h6”）//获取属性元素postcodes=items.Get（i）.getElementsByTag（“h6.a[href]”）的邮政编码//获取经度元素经度=items.Get（i）.选择（“div”）；这是网页抓取的链接。