Java中的复杂组正则表达式模式

Java中的复杂组正则表达式模式,java,regex,Java,Regex,我开发了正则表达式模式来解析科学文章中的参考书目。我们使用AMA引文样式,对于期刊引文,它可以如下所示: "Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort stu

我开发了正则表达式模式来解析科学文章中的参考书目。我们使用AMA引文样式,对于期刊引文,它可以如下所示:

"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24(7): 3057-3067."
或无发行编号:

"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24: 3057-3067."
或者只有第一页的电子号码

"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24(7): 3057."
或者如果在打印之前,仅使用卷号:

"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24."
我的模式匹配所有这些情况,并将所有数据转义分组为两个斜杠,因为Java:

(.*?)\\.(.*?)\\.(.*?)(?<year>\\d+)\\s*?;?\\s*?(?:(?<volume>\\d+))?(?:\\((?<issue>\\d+)\\))?\\s*?(?::\\s*?(?<fpage>\\d+|[A-Za-z]+\\d+))?(?:[\\-\\–](?<lpage>\\d+))?\\.
下面是一个示例,可以看到模式与此不正确匹配

正确的正则表达式是

(.*?)\.(.*?)\.(.*?)(?<year>\d+)\s*?;?\s*?(?:(?<volume>\d+))?(?:\((?<issue>\d+)\))?\s*?(?::\s*?(?<fpage>\d+|[A-Za-z]+\d+))?(?:[ ]*[\-|\–][ ]*(?<lpage>\d+))?\.

这个解决了你的问题。请检查。

用类似于?:\\-\\\\\-\\\\-\\\\-\\\\-\\\-\\\-\的东西替换结尾处的[\\-\\-]不起作用。所有这些都归第三组,比如。。只是在玩而已。也可以用。但对我来说并不理想。答案必须是独立的。如果你的链接变得陈旧,你的答案实际上是无用的。将您的解决方案包含在答案中。@VGR:Done-buddy!!是的。它起作用了。忘记了我可以在方括号内添加空格来匹配可选空格:
(.*?)\.(.*?)\.(.*?)(?<year>\d+)\s*?;?\s*?(?:(?<volume>\d+))?(?:\((?<issue>\d+)\))?\s*?(?::\s*?(?<fpage>\d+|[A-Za-z]+\d+))?(?:[ ]*[\-|\–][ ]*(?<lpage>\d+))?\.