谷歌地图url的Java正则表达式?

谷歌地图url的Java正则表达式?,java,regex,google-maps,Java,Regex,Google Maps,我想解析字符串中的所有google地图链接。格式如下: 第一个例子 https://www.google.com/maps/place/white+house/@38.8976763,-77.0387185,17z/数据=!3m1!4b1!4m5!3m4!10x89B7B7BCDB1DF:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298 https://www.google.com/maps/place/white+house/@38.89767

我想解析字符串中的所有google地图链接。格式如下:

第一个例子
https://www.google.com/maps/place/white+house/@38.8976763,-77.0387185,17z/数据=!3m1!4b1!4m5!3m4!10x89B7B7BCDB1DF:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298

https://www.google.com/maps/place/white+house/@38.8976763,-77.0387185,17z

https://www.google.com/maps/place//@38.8976763,-77.0387185,17z

https://maps.google.com/maps/place//@38.8976763,-77.0387185,17z

https://www.google.com/maps/place/@38.8976763,-77.0387185,17z

https://google.com/maps/place/@38.8976763,-77.0387185,17z

http://google.com/maps/place/@38.8976763,-77.0387185,17z

https://www.google.com.tw/maps/place/@38.8976763,-77.0387185,17z

这些都是有效的谷歌地图URL(链接到白宫)

这是我试过的

String gmapLinkRegex = "(http|https)://(www\\.)?google\\.com(\\.\\w*)?/maps/(place/.*)?@(.*z)[^ ]*";
Pattern patternGmapLink = Pattern.compile(gmapLinkRegex , Pattern.CASE_INSENSITIVE);
Matcher m = patternGmapLink.matcher(s);
while (m.find()) {
  logger.info("group0 = {}" , m.group(0));
  String place = m.group(4); 
  place = StringUtils.stripEnd(place , "/"); // remove tailing '/'
  place = StringUtils.stripStart(place , "place/"); // remove header 'place/'
  logger.info("place = '{}'" , place);
  String latLngZ = m.group(5);
  logger.info("latLngZ = '{}'" , latLngZ);
}
它在简单的情况下工作,但仍然有问题。。。 比如说

它需要后期处理来获取可选的
位置
信息

并且它无法提取包含两个URL的一行,例如:

s = "https://www.google.com/maps/place//@38.8976763,-77.0387185,17z " +
      " and http://google.com/maps/place/@38.8976763,-77.0387185,17z";
它应该是两个URL,但正则表达式匹配整行

要点:

  • 整个URL应在
    组(0)
    中匹配(包括第一个示例中的尾部
    数据
    部分)
  • 在第一个示例中,如果删除了缩放级别:
    17z
    ,它仍然是一个有效的gmap URL,但我的正则表达式无法匹配它
  • 更容易提取可选的
    place
    info
  • 必须进行Lat/Lng提取,可选择缩放级别
  • 能够在一行中解析多个URL
  • 能够处理
    maps.google.com(.xx)/maps
    ,我尝试了
    (www | maps\)?
    ,但似乎仍然有问题
有没有改进这个正则表达式的建议?非常感谢

点星号

.*
将始终允许在最后一个url的末尾添加任何内容。 您需要“更紧密”的正则表达式,它匹配一个URL,而不是多个URL之间的任何内容。 “[^]*”可能包含下一个URL,如果它由“”以外的内容分隔,其中包括换行符、制表符、移位空格

我建议(抱歉,没有在java上测试过),使用“除@之外的任何内容”、“数字、减号、逗号或点”和“可选的特殊字符串,后跟定制的字符集,很多次”

我在一个兼容perl正则表达式的引擎(np++)上测试了上面的一个 如果我猜错了什么,请你自己调整一下。明确的数字列表可能会被“\d”取代,我试图最小化对正则表达式风格的假设

为了匹配“URL”或“URL和URL”,请使用存储正则表达式的变量,然后使用“(URL和)*URL”,将“URL”替换为正则表达式变量(这在java中是可能的)。如果问题是如何检索多个匹配项:即java,我无能为力。让我知道,我删除这个答案,不是为了招惹应得的反对票;-)


(编辑以捕获数据部分,之前未看到,第一个示例,第一行;以及一行中的多个URL。)

我编写此正则表达式是为了验证google地图链接:

"(http:|https:)?\\/\\/(www\\.)?(maps.)?google\\.[a-z.]+\\/maps/?([\\?]|place/*[^@]*)?/*@?(ll=)?(q=)?(([\\?=]?[a-zA-Z]*[+]?)*/?@{0,1})?([0-9]{1,3}\\.[0-9]+(,|&[a-zA-Z]+=)-?[0-9]{1,3}\\.[0-9]+(,?[0-9]+(z|m))?)?(\\/?data=[\\!:\\.\\-0123456789abcdefmsx]+)?"
String location1 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location2 = "https://www.google.com.tw/maps/place/@38.8976763,-77.0387185,17z";
String location3 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location4 = "https://www.google.com/maps/place/white+house/@38.8976763,-77.0387185,17z/data=!3m1!4b1!4m5!3m4!1s0x89b7b7bcdecbb1df:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298";
String location5 = "https://www.google.com/maps/place/white+house/@38.8976763,-77.0387185,17z";
String location6 = "https://www.google.com/maps/place//@38.8976763,-77.0387185,17z";
String location7 = "https://maps.google.com/maps/place//@38.8976763,-77.0387185,17z";
String location8 = "https://www.google.com/maps/place/@38.8976763,-77.0387185,17z";
String location9 = "https://google.com/maps/place/@38.8976763,-77.0387185,17z";
String location10 = "http://google.com/maps/place/@38.8976763,-77.0387185,17z";
String location11 = "https://www.google.com/maps/place/@/data=!4m2!3m1!1s0x3135abf74b040853:0x6ff9dfeb960ec979";
String location12 = "https://maps.google.com/maps?q=New+York,+NY,+USA&hl=no&sll=19.808054,-63.720703&sspn=54.337928,93.076172&oq=n&hnear=New+York&t=m&z=10";
String location13 = "https://www.google.com/maps";
String location14 = "https://www.google.fr/maps";
String location15 = "https://google.fr/maps";
String location16 = "http://google.fr/maps";
String location17 = "https://www.google.de/maps";
String location18 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location19 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location20 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location21 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location22 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location23 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location24 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location25 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location26 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location27 = "http://google.com/maps/bylatlng?lat=21.01196022&lng=105.86298748";
String location28 = "https://www.google.com/maps/place/C%C3%B4ng+vi%C3%AAn+Th%E1%BB%91ng+Nh%E1%BA%A5t,+354A+%C4%90%C6%B0%E1%BB%9Dng+L%C3%AA+Du%E1%BA%A9n,+L%C3%AA+%C4%90%E1%BA%A1i+H%C3%A0nh,+%C4%90%E1%BB%91ng+%C4%90a,+H%C3%A0+N%E1%BB%99i+100000,+Vi%E1%BB%87t+Nam/@21.0121535,105.8443773,13z/data=!4m2!3m1!1s0x3135ab8ee6df247f:0xe6183d662696d2e9";
我使用以下谷歌地图链接列表进行了测试:

"(http:|https:)?\\/\\/(www\\.)?(maps.)?google\\.[a-z.]+\\/maps/?([\\?]|place/*[^@]*)?/*@?(ll=)?(q=)?(([\\?=]?[a-zA-Z]*[+]?)*/?@{0,1})?([0-9]{1,3}\\.[0-9]+(,|&[a-zA-Z]+=)-?[0-9]{1,3}\\.[0-9]+(,?[0-9]+(z|m))?)?(\\/?data=[\\!:\\.\\-0123456789abcdefmsx]+)?"
String location1 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location2 = "https://www.google.com.tw/maps/place/@38.8976763,-77.0387185,17z";
String location3 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location4 = "https://www.google.com/maps/place/white+house/@38.8976763,-77.0387185,17z/data=!3m1!4b1!4m5!3m4!1s0x89b7b7bcdecbb1df:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298";
String location5 = "https://www.google.com/maps/place/white+house/@38.8976763,-77.0387185,17z";
String location6 = "https://www.google.com/maps/place//@38.8976763,-77.0387185,17z";
String location7 = "https://maps.google.com/maps/place//@38.8976763,-77.0387185,17z";
String location8 = "https://www.google.com/maps/place/@38.8976763,-77.0387185,17z";
String location9 = "https://google.com/maps/place/@38.8976763,-77.0387185,17z";
String location10 = "http://google.com/maps/place/@38.8976763,-77.0387185,17z";
String location11 = "https://www.google.com/maps/place/@/data=!4m2!3m1!1s0x3135abf74b040853:0x6ff9dfeb960ec979";
String location12 = "https://maps.google.com/maps?q=New+York,+NY,+USA&hl=no&sll=19.808054,-63.720703&sspn=54.337928,93.076172&oq=n&hnear=New+York&t=m&z=10";
String location13 = "https://www.google.com/maps";
String location14 = "https://www.google.fr/maps";
String location15 = "https://google.fr/maps";
String location16 = "http://google.fr/maps";
String location17 = "https://www.google.de/maps";
String location18 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location19 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location20 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location21 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location22 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location23 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location24 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location25 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location26 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location27 = "http://google.com/maps/bylatlng?lat=21.01196022&lng=105.86298748";
String location28 = "https://www.google.com/maps/place/C%C3%B4ng+vi%C3%AAn+Th%E1%BB%91ng+Nh%E1%BA%A5t,+354A+%C4%90%C6%B0%E1%BB%9Dng+L%C3%AA+Du%E1%BA%A9n,+L%C3%AA+%C4%90%E1%BA%A1i+H%C3%A0nh,+%C4%90%E1%BB%91ng+%C4%90a,+H%C3%A0+N%E1%BB%99i+100000,+Vi%E1%BB%87t+Nam/@21.0121535,105.8443773,13z/data=!4m2!3m1!1s0x3135ab8ee6df247f:0xe6183d662696d2e9";

我认为如果你从最后删除
[^]*
,这已经是一个进步。你所有的例子都以
[\d]z
结尾,那部分是用来做什么的?
z
表示缩放级别,有时是强制的,有时是可选的。在第一个示例中(带有拖尾
数据
部分),它是可选的,但在其他示例中,
z
似乎是强制性的。嗨,它在一行中匹配多个URL,但不能匹配示例1的
数据
部分。(我想最后一部分是
[^\\W]*
),如果
[^\\W]*
,它不能在一行中匹配多个URL。所以它应该捕获多个URL?我错了。我的“bio-visual”正则表达式在将第一个示例复制到测试输入时遗漏了整个示例。我会回来的。最后一部分(现在无论如何完全不同)完全是伪造的,与“除空格以外的任何内容”无关,即“[^\s]”。它被错误地称为“除了类似标识符的字符以外的任何东西”。算了吧。它只是偶然做了一些有用的事。如果您进行适当的测试(in-),任何错误都可以保持不变。
(http | https):/(www\\)?google\\.com(\\\.\\w*)?/maps/(place/[^@]*)?@([0-9.,-]*z)(/data=[!:.\\-0-9a-fmsx]+)”
谢谢,它几乎可以工作,除了
位置
还需要进行后期处理。@smallfu(java不是我的拿手好戏)我可以添加一些东西来简单地从正则表达式中通过外科手术捕获它。