Java中MetaMap的正则表达式_Java_Regex

Java中MetaMap的正则表达式

java regex

Java中MetaMap的正则表达式,java,regex,Java,Regex,图元映射文件具有以下行： mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)])]). 格式解释如下： mappings( [map(negated overall score for this mapping, [ev(negated candidate score,'UMLS concept I

图元映射文件具有以下行：

mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)])]).

格式解释如下：

mappings(
      [map(negated overall score for this mapping, 
            [ev(negated candidate score,'UMLS concept ID','UMLS concept','preferred name for concept - may or may not be different',
                 [matched word or words lowercased that this candidate matches in the phrase - comma separated list],
                 [semantic type(s) - comma separated list],
                 [match map list - see below],candidate involved with head of phrase - yes or no,
                 is this an overmatch - yes or no
               )
            ]
          )
      ]
    ).

我想在java中运行一个RegEx查询，该查询提供字符串“UMLS概念ID”、语义类型和匹配映射列表。

在Java中，正则表达式是正确的工具还是实现这一点最有效的方法？

这真是一种令人毛骨悚然的格式。Regex听起来很不错，但你会有一个真正毛茸茸的Regex：

mappings\(\[map\(-?[0-9.]+,\[ev\(-?[0-9.]+,'(.*?)','.*?','.*?',\[.*?\],\[(.*?)\],\[(.*)\],(?:yes|no),(?:yes|no)\)\]\)\]\)\.

当您必须将正则表达式表示为Java字符串时，情况会变得更糟——一如既往，您将用

\\

替换每个

。但这会让你得到你想要的；匹配组1、2和3是您想要拉出的字符串。请注意，我还没有针对错误的输入对它进行严格的测试，因为我对它没有胃口。：）

出于教育目的：尽管它看起来很简单，但实际上构建起来并不困难——我只是取了您的采样线，并用适当的通配符替换了实际值，确保去掉括号和括号以及结尾的点。

这是一种非常复杂的格式。Regex听起来很不错，但你会有一个真正毛茸茸的Regex：

mappings\(\[map\(-?[0-9.]+,\[ev\(-?[0-9.]+,'(.*?)','.*?','.*?',\[.*?\],\[(.*?)\],\[(.*)\],(?:yes|no),(?:yes|no)\)\]\)\]\)\.

当您必须将正则表达式表示为Java字符串时，情况会变得更糟——一如既往，您将用

\\

替换每个

出于教育目的：尽管它看起来很简单，但实际上构建起来并不困难——我只是取了您的采样线，用适当的通配符替换了实际值，确保去掉括号和括号以及结尾的点。

有可能，是的

类似于（假设您引用的值是唯一合法的地方，您添加[]的值是唯一合法的地方，“[”和“]”字符不能出现在值中，匹配图列表中不能有]]，除了在末尾。您得到了图片--很多假设…）

这将为您提供这三个字段作为三个匹配的组（在您的示例中使用测试）

那是-

"^[^']+?'([^']*+)'[^\\[]+\\[[^]]+\\],\\[([^\\]]*?)\\],\\[\\[(.*?)\\]\\].*$"

作为Java字符串

但这不是很容易维护。可能会更好的是更详细一点与这一个

有可能，是的

这将为您提供这三个字段作为三个匹配的组（在您的示例中使用测试）

那是-

"^[^']+?'([^']*+)'[^\\[]+\\[[^]]+\\],\\[([^\\]]*?)\\],\\[\\[(.*?)\\]\\].*$"

作为Java字符串

但这不是很容易维护。可能会更好的是更详细一点与这一个

下面是我对正则表达式解决方案的尝试。这种

替换的“meta regexing”方法是我正在试验的东西；我希望它能读到更可读的代码
String line = "mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)])]).";
String regex = 
    "mappings([map(number,[ev(number,<quoted>,quoted,quoted,[csv],[<csv>],[<matchmap>],yesno,yesno)])])."
    .replaceAll("([\\.\\(\\)\\[\\]])", "\\\\$1") // escape metacharacters
    .replace("<", "(").replace(">", ")") // set up capture groups
    .replace("number", "-?\\d+")
    .replace("quoted", "'[^']*'")
    .replace("yesno", "(?:yes|no)")
    .replace("csv", "[^\\]]*")
    .replace("matchmap", ".*?")
;
System.out.println(regex);
// prints "mappings\(\[map\(-?\d+,\[ev\(-?\d+,('[^']*'),'[^']*','[^']*',\[[^\]]*\],\[([^\]]*)\],\[(.*?)\],(?:yes|no),(?:yes|no)\)\]\)\]\)\."

Matcher m = Pattern.compile(regex).matcher(line);
if (m.find()) {
    System.out.println(m.group(1)); // prints "'C0018017'"
    System.out.println(m.group(2)); // prints "inpr"
    System.out.println(m.group(3)); // prints "[[1,1],[1,1],0]"
}

String line=“映射（[map（-1000，[ev（-1000，'C0018017'，'Objective'，'Goals'，[Objective]，[inpr]，[1,1]，[1,1]，0]]，yes，no）]）”；
字符串正则表达式=
映射（[map（编号，[ev（编号，引用，引用，[csv]，[]，[]，yesno，yesno）]）]））
.replaceAll（（[\\.\\（\\）\\[\\]]），“\\\$1”）//转义元字符
.replace（“，”）//设置捕获组
.替换（“数字”，“-？\\d+”）
.替换（“引用的“，”[^']*”）
.替换（“是”、“否”）（？：是|否）
.替换（“csv”和“[^\\]]*”）
.替换（“匹配图”，“*？”）
;
System.out.println（regex）；
//打印“映射”（\[map\（？\d+，\[ev\（？\d+，（“[^']*”），“[^']*”，“[^']*”，\[^\]]*]，\[（[^\]*）\]，\[（*？），（？：是|否），（？：是|否）\]\）\）\”
Matcher m=Pattern.compile（regex）.Matcher（line）；
if（m.find（））{
System.out.println（m.group（1））；//打印“C0018017”
System.out.println（m.group（2））；//打印“inpr”
System.out.println（m.group（3））；//打印“[[1,1]，[1,1]，0]”
}

这个replace
meta regexing允许您通过设置适当的replace
（而不是将其全部放入一个无法读取的混乱中）来轻松容纳符号之间的空格。
下面是我对正则表达式解决方案的尝试。这种替换的“meta regexing”方法是我正在试验的东西；我希望它能读到更可读的代码
String line = "mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)])]).";
String regex = 
    "mappings([map(number,[ev(number,<quoted>,quoted,quoted,[csv],[<csv>],[<matchmap>],yesno,yesno)])])."
    .replaceAll("([\\.\\(\\)\\[\\]])", "\\\\$1") // escape metacharacters
    .replace("<", "(").replace(">", ")") // set up capture groups
    .replace("number", "-?\\d+")
    .replace("quoted", "'[^']*'")
    .replace("yesno", "(?:yes|no)")
    .replace("csv", "[^\\]]*")
    .replace("matchmap", ".*?")
;
System.out.println(regex);
// prints "mappings\(\[map\(-?\d+,\[ev\(-?\d+,('[^']*'),'[^']*','[^']*',\[[^\]]*\],\[([^\]]*)\],\[(.*?)\],(?:yes|no),(?:yes|no)\)\]\)\]\)\."

Matcher m = Pattern.compile(regex).matcher(line);
if (m.find()) {
    System.out.println(m.group(1)); // prints "'C0018017'"
    System.out.println(m.group(2)); // prints "inpr"
    System.out.println(m.group(3)); // prints "[[1,1],[1,1],0]"
}

String line=“映射（[map（-1000，[ev（-1000，'C0018017'，'Objective'，'Goals'，[Objective]，[inpr]，[1,1]，[1,1]，0]]，yes，no）]）”；
字符串正则表达式=
映射（[map（编号，[ev（编号，引用，引用，[csv]，[]，[]，yesno，yesno）]）]））
.replaceAll（（[\\.\\（\\）\\[\\]]），“\\\$1”）//转义元字符
.replace（“，”）//设置捕获组
.替换（“数字”，“-？\\d+”）
.替换（“引用的“，”[^']*”）
.替换（“是”、“否”）（？：是|否）
.替换（“csv”和“[^\\]]*”）
.替换（“匹配图”，“*？”）
;
System.out.println（regex）；
//打印“映射”（\[map\（？\d+，\[ev\（？\d+，（“[^']*”），“[^']*”，“[^']*”，\[^\]]*]，\[（[^\]*）\]，\[（*？），（？：是|否），（？：是|否）\]\）\）\”
Matcher m=Pattern.compile（regex）.Matcher（line）；
if（m.find（））{
System.out.println（m.group（1））；//打印“C0018017”
System.out.println（m.group（2））；//打印“inpr”
System.out.println（m.group（3））；//打印“[[1,1]，[1,1]，0]”
}

这种replace
meta regexing允许您通过设置适当的replace
（而不是将其全部放入一个不可读的混乱中）来轻松容纳符号之间的空白。顺便说一句：10月的理想工作是什么？我喜欢你的元正则表达式方法！到目前为止，我只使用命名字符串常量（stringnumber=“-？\\d+”
）并将它们连接起来（…+”[ev（“+number+”，“+…
），但结果仍然如此