Php 正则表达式:从字符串中分离nn:nn

Php 正则表达式:从字符串中分离nn:nn,php,regex,Php,Regex,我已经做了几个小时了,这里的解决方案都没有真正帮助我。我有一个文本文件格式为“NN:NN字符串在这里”。实际文件如下所示。我需要将章节:诗句从实际的弦乐诗句中正则化。正如您所看到的,并不是所有的都用换行符分隔。我得到的最接近的是(\d{1,2}:\d{1,2})[^\d]*,但它实际上只分隔了NN:NN 我怎样才能完成分离字符串 1:1 The book of the generation of Jesus Christ, the son of David, the son of Abraham

我已经做了几个小时了,这里的解决方案都没有真正帮助我。我有一个文本文件格式为“NN:NN字符串在这里”。实际文件如下所示。我需要将章节:诗句从实际的弦乐诗句中正则化。正如您所看到的,并不是所有的都用换行符分隔。我得到的最接近的是
(\d{1,2}:\d{1,2})[^\d]*
,但它实际上只分隔了NN:NN

我怎样才能完成分离字符串

1:1 The book of the generation of Jesus Christ, the son of David, the son of Abraham.

1:2 Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren; 1:3 And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram; 1:4 And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon; 1:5 And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse; 1:6 And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias; 1:7 And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa; 1:8 And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias; 1:9 And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias; 1:10 And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias; 1:11
And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon: 1:12 And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel; 1:13 And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor; 1:14 And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud; 1:15 And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob; 1:16 And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.

1:17 So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.

1:18 Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.

你很接近。以下方面应起作用:

preg_match_all("/(\d{1,2}:\d{1,2})([^\d]*)/", $str, $output_array);

print_r(array_combine($output_array[1], $output_array[2]));
正则表达式

详细信息

$text = "1:1 The book of the generation of Jesus Christ, the son of David, the son of Abraham............";

preg_match_all("/(\d+:\d+)\R?\s*(.+?(?=\s*\d+:\d+|$))/m", $text, $matches);
print_r(array_combine($matches[1], $matches[2]));
Array
(
    [1:1] => The book of the generation of Jesus Christ, the son of David, the son of Abraham.
    [1:2] => Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren;
    [1:3] => And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram;
    [1:4] => And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon;
    [1:5] => And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse;
    [1:6] => And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias;
    [1:7] => And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa;
    [1:8] => And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias;
    [1:9] => And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias;
    [1:10] => And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias;
    [1:11] => And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon:
    [1:12] => And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel;
    [1:13] => And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor;
    [1:14] => And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud;
    [1:15] => And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob;
    [1:16] => And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.
    [1:17] => So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.
    [1:18] => Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.
)
  • \d
    匹配一个数字(等于
    [0-9]
  • \R
    匹配任何Unicode换行符序列
  • \s
    匹配任何空白字符
  • +?
    匹配任何字符(行终止符除外)
  • $
    断言字符串末尾的位置
  • 在0到1次之间匹配
  • |
  • +
    在一次和无限次之间匹配
  • *
    在零次和无限次之间匹配
PHP代码

$text = "1:1 The book of the generation of Jesus Christ, the son of David, the son of Abraham............";

preg_match_all("/(\d+:\d+)\R?\s*(.+?(?=\s*\d+:\d+|$))/m", $text, $matches);
print_r(array_combine($matches[1], $matches[2]));
Array
(
    [1:1] => The book of the generation of Jesus Christ, the son of David, the son of Abraham.
    [1:2] => Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren;
    [1:3] => And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram;
    [1:4] => And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon;
    [1:5] => And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse;
    [1:6] => And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias;
    [1:7] => And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa;
    [1:8] => And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias;
    [1:9] => And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias;
    [1:10] => And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias;
    [1:11] => And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon:
    [1:12] => And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel;
    [1:13] => And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor;
    [1:14] => And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud;
    [1:15] => And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob;
    [1:16] => And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.
    [1:17] => So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.
    [1:18] => Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.
)
输出

$text = "1:1 The book of the generation of Jesus Christ, the son of David, the son of Abraham............";

preg_match_all("/(\d+:\d+)\R?\s*(.+?(?=\s*\d+:\d+|$))/m", $text, $matches);
print_r(array_combine($matches[1], $matches[2]));
Array
(
    [1:1] => The book of the generation of Jesus Christ, the son of David, the son of Abraham.
    [1:2] => Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren;
    [1:3] => And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram;
    [1:4] => And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon;
    [1:5] => And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse;
    [1:6] => And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias;
    [1:7] => And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa;
    [1:8] => And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias;
    [1:9] => And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias;
    [1:10] => And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias;
    [1:11] => And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon:
    [1:12] => And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel;
    [1:13] => And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor;
    [1:14] => And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud;
    [1:15] => And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob;
    [1:16] => And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.
    [1:17] => So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.
    [1:18] => Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.
)

Regex lookback将使您的任务更容易

/(?:\d+:\d+).*(?=(?:\d+:\d+))/s


请参见

这不仅是合理的快速,它还修剪了文本值的所有前导/尾随空白*您的所有文本行都以
结尾,或
我正在利用这一事实来提高模式效率

如果在实际项目中,某些句子包含换行符(您的示例没有),则在第二个模式分隔符后添加
s
,使
也与换行符匹配

~(\d{1,2}:\d{1,2})\s+(.*?[:;。])(?=\s*(?:\d{1,2}:\d{1,2})|$)~
2193步

代码:()

输出:

array (
  '1:1' => 'The book of the generation of Jesus Christ, the son of David, the son of Abraham.',
  '1:2' => 'Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judas and his brethren;',
  '1:3' => 'And Judas begat Phares and Zara of Thamar; and Phares begat Esrom; and Esrom begat Aram;',
  '1:4' => 'And Aram begat Aminadab; and Aminadab begat Naasson; and Naasson begat Salmon;',
  '1:5' => 'And Salmon begat Booz of Rachab; and Booz begat Obed of Ruth; and Obed begat Jesse;',
  '1:6' => 'And Jesse begat David the king; and David the king begat Solomon of her that had been the wife of Urias;',
  '1:7' => 'And Solomon begat Roboam; and Roboam begat Abia; and Abia begat Asa;',
  '1:8' => 'And Asa begat Josaphat; and Josaphat begat Joram; and Joram begat Ozias;',
  '1:9' => 'And Ozias begat Joatham; and Joatham begat Achaz; and Achaz begat Ezekias;',
  '1:10' => 'And Ezekias begat Manasses; and Manasses begat Amon; and Amon begat Josias;',
  '1:11' => 'And Josias begat Jechonias and his brethren, about the time they were carried away to Babylon:',
  '1:12' => 'And after they were brought to Babylon, Jechonias begat Salathiel; and Salathiel begat Zorobabel;',
  '1:13' => 'And Zorobabel begat Abiud; and Abiud begat Eliakim; and Eliakim begat Azor;',
  '1:14' => 'And Azor begat Sadoc; and Sadoc begat Achim; and Achim begat Eliud;',
  '1:15' => 'And Eliud begat Eleazar; and Eleazar begat Matthan; and Matthan begat Jacob;',
  '1:16' => 'And Jacob begat Joseph the husband of Mary, of whom was born Jesus, who is called Christ.',
  '1:17' => 'So all the generations from Abraham to David are fourteen generations; and from David until the carrying away into Babylon are fourteen generations; and from the carrying away into Babylon unto Christ are fourteen generations.',
  '1:18' => 'Now the birth of Jesus Christ was on this wise: When as his mother Mary was espoused to Joseph, before they came together, she was found with child of the Holy Ghost.',
)
说明:

~                        #Pattern delimiter
(\d{1,2}:\d{1,2})        #Capture nn:nn as Group1
\s+                      #Match one or more whitespaces (including newlines)
(                        #Start Capture Group2
  .*?                    #Lazily match zero or more non-newline characters
  [:;.]                  #Match a colon, semi-colon, or dot
  (?=                    #Start "lookahead" (aka: match but don't consume)
    \s*                  #Match zero or more whitespace characters
    (?:\d{1,2}:\d{1,2})  #Match nn:nn
    |                    #Or
    $                    #Match the end of the entire string
  )                      #End "lookahead"
)                        #End Capture Group2
~                        #Pattern delimiter

具体要求是什么?这些NN:NNs后面是否应该有空格和大写字母?您只是在提取它们还是要插入一些内容?看,我正在提取它们。在NN:NN向我们展示最终结果之后应该没有空格。