Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/google-sheets/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 要匹配图书名称的正则表达式--组_Java_Regex - Fatal编程技术网

Java 要匹配图书名称的正则表达式--组

Java 要匹配图书名称的正则表达式--组,java,regex,Java,Regex,我写了一个正则表达式: value='[A-Za-z]+\\,[0-9]+\\,([A-Za-z0-9]+)\\,([A-Za-z0-9]+)'>[A-Za-z0-9]+\\s-\\s(.*)?\\s\\( 它工作得相当好,但问题是,它的末端始终与所有内容匹配 例如,它应该用于书籍,我正在以下方面进行测试: value='C,201301,F110,JEWL1050'>JEWL1050 - Industry Skills I (F110)</option> value=

我写了一个正则表达式:

value='[A-Za-z]+\\,[0-9]+\\,([A-Za-z0-9]+)\\,([A-Za-z0-9]+)'>[A-Za-z0-9]+\\s-\\s(.*)?\\s\\(
它工作得相当好,但问题是,它的末端始终与所有内容匹配

例如,它应该用于书籍,我正在以下方面进行测试:

value='C,201301,F110,JEWL1050'>JEWL1050 - Industry Skills I (F110)</option>
value='C,201301,F114,JEWL1050'>JEWL1050 - Industry Skills I (F114)</option>
value='C,201301,F114,JEWL1054'>JEWL1054 - Jewellery Rendering & Illustra (F114)</option>
value='C,201301,F110,JEWL2029'>JEWL2029 - Production Techniques B (F110)</option>
value='C,201301,F114,JEWL2029'>JEWL2029 - Production Techniques B (F114)</option>
value='C,201301,LIAD,LANG9066'>LANG9066 - Italian For Beginners (LIAD)</option>
value='C,201301,T302,LAW1151'>LAW1151 - Canandian & Environmental Law (T302)</option>
value='C,201301,T305,LAW1151'>LAW1151 - Canandian & Environmental Law (T305)</option>
value='C,201301,F402,LAW1152'>LAW1152 - International Law & Agreements (F402)</option>
value='C,201301,T302,LAW3201'>LAW3201 - Protection Legislation (T302)</option>
value='C,201301,T303,LAW3201'>LAW3201 - Protection Legislation (T303)</option>
value='C,201301,T304,LAW3201'>LAW3201 - Protection Legislation (T304)</option>
value='C,201301,F110,JEWL1050'>JEWL1050-行业技能I(F110)
value='C,201301,F114,JEWL1050'>JEWL1050-行业技能I(F114)
value='C,201301,F114,JEWL1054'>JEWL1054-珠宝渲染与插图(F114)
value='C,201301,F110,JEWL2029'>JEWL2029-生产技术B(F110)
value='C,201301,F114,JEWL2029'>JEWL2029-生产技术B(F114)
value='C,201301,LIAD,LANG9066'>LANG9066-意大利语初学者(LIAD)
value='C,201301,T302,LAW1151'>LAW1151-加拿大和环境法(T302)
value='C,201301,T305,LAW1151'>LAW1151-加拿大和环境法(T305)
value='C,201301,F402,LAW1152'>LAW1152-国际法与协议(F402)
value='C,201301,T302,LAW3201'>LAW3201-保护立法(T302)
value='C,201301,T303,LAW3201'>LAW3201-保护立法(T303)
value='C,201301,T304,LAW3201'>LAW3201-保护立法(T304)
因此,对于第一本书,它应该将
F110
作为第1组,将
JEWL1050
作为第2组,将
Industrial Skills I
作为第3组

但是,它可以正确捕获前两个组,但不能捕获最后一个组。它捕获了行业技能I(F110)

有什么办法可以修改我的正则表达式吗?我似乎根本没法让它做最后一组。
请帮帮我。感谢您的高级指导。

理论上,这应该是可行的

以下是应用于示例输入时建议的正则表达式(由于工具与Java代码的性质,
\\
更改为
\
):

该工具还提供了一个“Java”复选框,甚至提供了相应的Java代码,尽管没有永久链接,因此您必须自己输入正则表达式(再次使用
\\
而不是
\
)和示例数据:

也就是说,对于子孙后代来说,这是它的成果:

Raw Match Pattern:

  value='[A-Za-z]+\,[0-9]+\,([A-Za-z0-9]+)\,([A-Za-z0-9]+)'>[A-Za-z0-9]+\s-\s(.*)?\s\(

Java Code Example:

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
    String sourcestring = "source string to match with pattern";
    Pattern re = Pattern.compile("value='[A-Za-z]+\\,[0-9]+\\,([A-Za-z0-9]+)\\,([A-Za-z0-9]+)'>[A-Za-z0-9]+\\s-\\s(.*)?\\s\\(");
    Matcher m = re.matcher(sourcestring);
    int mIdx = 0;
    while (m.find()){
      for (int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

$matches Array:
(
  [0] => Array
    (
      [0] => value='C,201301,F110,JEWL1050'>JEWL1050 - Industry Skills I (
      [1] => value='C,201301,F114,JEWL1050'>JEWL1050 - Industry Skills I (
      [2] => value='C,201301,F114,JEWL1054'>JEWL1054 - Jewellery Rendering & Illustra (
      [3] => value='C,201301,F110,JEWL2029'>JEWL2029 - Production Techniques B (
      [4] => value='C,201301,F114,JEWL2029'>JEWL2029 - Production Techniques B (
      [5] => value='C,201301,LIAD,LANG9066'>LANG9066 - Italian For Beginners (
      [6] => value='C,201301,T302,LAW1151'>LAW1151 - Canandian & Environmental Law (
      [7] => value='C,201301,T305,LAW1151'>LAW1151 - Canandian & Environmental Law (
      [8] => value='C,201301,F402,LAW1152'>LAW1152 - International Law & Agreements (
      [9] => value='C,201301,T302,LAW3201'>LAW3201 - Protection Legislation (
      [10] => value='C,201301,T303,LAW3201'>LAW3201 - Protection Legislation (
      [11] => value='C,201301,T304,LAW3201'>LAW3201 - Protection Legislation (
    )

  [1] => Array
    (
      [0] => F110
      [1] => F114
      [2] => F114
      [3] => F110
      [4] => F114
      [5] => LIAD
      [6] => T302
      [7] => T305
      [8] => F402
      [9] => T302
      [10] => T303
      [11] => T304
    )

  [2] => Array
    (
      [0] => JEWL1050
      [1] => JEWL1050
      [2] => JEWL1054
      [3] => JEWL2029
      [4] => JEWL2029
      [5] => LANG9066
      [6] => LAW1151
      [7] => LAW1151
      [8] => LAW1152
      [9] => LAW3201
      [10] => LAW3201
      [11] => LAW3201
    )

  [3] => Array
    (
      [0] => Industry Skills I
      [1] => Industry Skills I
      [2] => Jewellery Rendering & Illustra
      [3] => Production Techniques B
      [4] => Production Techniques B
      [5] => Italian For Beginners
      [6] => Canandian & Environmental Law
      [7] => Canandian & Environmental Law
      [8] => International Law & Agreements
      [9] => Protection Legislation
      [10] => Protection Legislation
      [11] => Protection Legislation
    )
)
原始匹配模式:
值=“[A-Za-z]+\,[0-9]+\,([A-Za-z0-9]+\,([A-Za-z0-9]+)”>[A-Za-z0-9]+\s-\s(.*)\s\(
Java代码示例:
导入java.util.regex.Pattern;
导入java.util.regex.Matcher;
类模块1{
公共静态void main(字符串[]asd){
String sourcestring=“要与模式匹配的源字符串”;
模式re=Pattern.compile(“值=”[A-Za-z]+\\,[0-9]+\\,([A-Za-z0-9]+\,([A-Za-z0-9]+)”>[A-Za-z0-9]+\\s-\\s(.*)\\s\\(”;
Matcher m=re.Matcher(sourcestring);
int mIdx=0;
while(m.find()){
对于(int-groupIdx=0;groupIdx阵列
(
[0]=>value='C,201301,F110,JEWL1050'>JEWL1050-行业技能I(
[1] =>value='C,201301,F114,JEWL1050'>JEWL1050-行业技能I(
[2] =>value='C,201301,F114,JEWL1054'>JEWL1054-珠宝渲染与插图(
[3] =>value='C,201301,F110,JEWL2029'>JEWL2029-生产技术B(
[4] =>value='C,201301,F114,JEWL2029'>JEWL2029-生产技术B(
[5] =>value='C,201301,LIAD,LANG9066'>LANG9066-意大利语初学者(
[6] =>value='C,201301,T302,LAW1151'>LAW1151-加拿大和环境法(
[7] =>value='C,201301,T305,LAW1151'>LAW1151-加拿大和环境法(
[8] =>value='C,201301,F402,LAW1152'>LAW1152-国际法与协议(
[9] =>value='C,201301,T302,法律3201'>LAW3201-保护立法(
[10] =>value='C,201301,T303,法律3201'>LAW3201-保护立法(
[11] =>value='C,201301,T304,法律3201'>法律3201-保护立法(
)
[1] =>阵列
(
[0]=>F110
[1] =>F114
[2] =>F114
[3] =>F110
[4] =>F114
[5] =>利亚德
[6] =>T302
[7] =>T305
[8] =>F402
[9] =>T302
[10] =>T303
[11] =>T304
)
[2] =>阵列
(
[0]=>JEWL1050
[1] =>JEWL1050
[2] =>JEWL1054
[3] =>JEWL2029
[4] =>JEWL2029
[5] =>LANG9066
[6] =>法律1151
[7] =>法律1151
[8] =>法律1152
[9] =>法律3201
[10] =>法律3201
[11] =>法律3201
)
[3] =>阵列
(
[0]=>行业技能I
[1] =>行业技能I
[2] =>珠宝渲染和Illustra
[3] =>生产技术B
[4] =>生产技术B
[5] =>意大利语初学者
[6] =>加拿大和环境法
[7] =>加拿大和环境法
[8] =>国际法和协定
[9] =>保护立法
[10] =>保护立法
[11] =>保护立法
)
)

这是一个更复杂的正则表达式

value='(?:[^,]+,){2}([^,]+),([^,]+)'>[^-]+-\s+([^(]+)(?=\s)

参见我已经检查过
C,201301
是不需要的。因此一个简单的解决方案是将
之间的值视为垃圾,只关注
([a-Z]+[0-9])+\\s-\\s(.*)\\s([a-Z0-9]+)<

作为三组的充分表达。

您确定它捕获了行业技能I(F110)吗?它甚至与
-
不匹配。您是否打印了正确的组?最后一个捕获组中的
用于什么?请不要使用html解析器regex@Anirudh-html解析器对此没有帮助…答案是不要使用正则表达式解析html。请参阅,对于传说中的线程,您可以在使用正则表达式之前删除标记。:o当我从你的代码中复制粘贴它时会起作用。奇怪。谢谢!!=)我也将该网站添加到了书签中。
<option value='C,201301,T302,LAW3201'>LAW3201 - Protection Legislation (T302)</option>
<option value='C,201301,T303,LAW3201'>LAW3201 - Protection Legislation (T303)</option>
<option value='C,201301,T304,LAW3201'>LAW3201 - Protection Legislation (T304)</option>
>([A-Z]+[0-9])+\\s-\\s(.*)?\\s([A-Z0-9]+)<