Javascript 如何从以下文本中提取信息？_Javascript_Regex

Javascript 如何从以下文本中提取信息？

javascript regex

Javascript 如何从以下文本中提取信息？,javascript,regex,Javascript,Regex,我试图从不同网站的文本中提取标题、描述和地址。我目前正在做一些网页爬行，提取上述信息。但是，我很难找到一个正则表达式来匹配我下面想要的预期文本输出我可以知道如何改进正则表达式并嵌入建议的规则集，以满足并提取上述信息吗我的正则表达式： (^.+\n)(^.+\n)?(^\d+.*\d{6}) First line (title) - can contain any alphabets and numbers - should not contain dot(.) Second

我试图从不同网站的文本中提取标题、描述和地址。我目前正在做一些网页爬行，提取上述信息。但是，我很难找到一个正则表达式来匹配我下面想要的预期文本输出

我可以知道如何改进正则表达式并嵌入建议的规则集，以满足并提取上述信息吗

我的正则表达式：

(^.+\n)(^.+\n)?(^\d+.*\d{6})

First line (title)
    - can contain any alphabets and numbers
    - should not contain dot(.)
Second line (description or additonal information)
    - can contain any alphabets and numbers
    - should contain dot(.)
    - second line can be empty
    - if its empty then extract the first line which is the title
Third line (address)
    - address extraction

要嵌入的规则集：

(^.+\n)(^.+\n)?(^\d+.*\d{6})

First line (title)
    - can contain any alphabets and numbers
    - should not contain dot(.)
Second line (description or additonal information)
    - can contain any alphabets and numbers
    - should contain dot(.)
    - second line can be empty
    - if its empty then extract the first line which is the title
Third line (address)
    - address extraction

输入文本：

View store information
TAMPINES MART
11559.33Km Away,
5 TAMPINES ST 32, #01-07/16 TAMPINESS MART, 529284
67817232
Open Now
Full Menu
View store information
THE SIGNATURE
The SIGNATURE is a wonderful destination for shopping text.
51, CHANGI BUSINESS PARK CENTRAL 2, #01-15, THE SIGNATURE, 486066
65883667
Open Now
Full Menu
Jewel Changi Airport
Jewel Changi Airport is a breath-taking place for families text.
78 Airport Boulevard, #B2-275-277 Jewel Changi Airport, Singapore, 819666

预期文本输出：（理想情况下）

一个选项是使用

\w

匹配单词，并重复第一个捕获组，以获取最后一次迭代的值作为标题

^(\w+(?: \w+)*\r?\n)*(?:(?![^.\r\n]*\.|.*\d{6}).*\r?\n)*(?:([^\r\n.]*\..*(?:\r?\n(?!.* \d{6}).*)*)\r?\n)?(.* \d{6}(?:\r?\n(?![A-Z]).*)*)$

const regex=/^（\w+（？：\w+*\r？\n）*（？：（？！[^.\r\n]*.\d{6}）。*\r？\n）*（？：（[^\r\n.]*.*（？：\r？\n（？.*\d{6}.*）*）\r？\n）（.*\d{6}.*（.\d{6}.\r\n（？[A-Z.*））$/mg；
const str=`查看存储信息
坦宾斯购物中心
11559.33公里外，
TAMPINES街5号，邮编：529284，TAMPINES商业街16号，01-07
67817232
现在开门
全菜单
查看存储信息
签名
签名是购物文字的绝佳目的地。
樟宜商业园中环2号51号#01-15签名486066
65883667
现在开门
全菜单
宝石樟宜机场
珍宝樟宜机场是一个令人叹为观止的地方。
新加坡宝石樟宜机场B2-275-277号机场大道78号，邮编819666`；
让m；
while（（m=regex.exec（str））！==null）{
//这是避免具有零宽度匹配的无限循环所必需的
if（m.index==regex.lastIndex）{
regex.lastIndex++；
}
控制台日志（“标题：+m[1]）；
如果（未定义！==m[2]）{
控制台日志（“说明：+m[2]）；
}
控制台日志（“地址：+m[3]）；
console.log（“\n”）
}

它将所有内容作为一个整体进行匹配。正则表达式是否可以更改为分别提取标题和描述，而不是提取所有内容？@scorezel789你的意思是这样的吗？我已经编辑了输入文本。我做了一些测试，但它似乎与第2组和第3组不匹配。@scorezel789如果它包含一个点，则可以匹配整条线。对于第一组，我已经更新了它，允许以单词char开头的匹配

\w

，也许对于第三组，您可以排除以大写字符开头的匹配行。如果不在模式中使用打开的、现在的、完整的菜单或视图存储信息，就很难区分它们之间的匹配，而不知道会有多少个sencencefollow@scorezel789啊,就这样