Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 正则表达式,用于在保留标点符号的同时由连字符和下划线连接的单词_Php_Regex - Fatal编程技术网

Php 正则表达式,用于在保留标点符号的同时由连字符和下划线连接的单词

Php 正则表达式,用于在保留标点符号的同时由连字符和下划线连接的单词,php,regex,Php,Regex,我一直在阅读、搜索和试用不同的方法来编写正则表达式,比如p{L}、[a-z]和\w,但我似乎没有得到我想要的结果 问题 我有一个由带标点符号的完整句子组成的数组,我正在使用以下pre_匹配通过一个数组进行解析,该数组在保留单词和标点符号方面效果很好 preg_match_all('/(\w+|[.;?!,:])/', $match, $matches) 然而,我现在有这样的话: 换个词 更多像这样的词 我希望能够保留这些单词的完整性,因为它们是相互关联的,但我目前的preg_匹配将它们分解

我一直在阅读、搜索和试用不同的方法来编写正则表达式,比如p{L}、[a-z]和\w,但我似乎没有得到我想要的结果

问题 我有一个由带标点符号的完整句子组成的数组,我正在使用以下pre_匹配通过一个数组进行解析,该数组在保留单词和标点符号方面效果很好

preg_match_all('/(\w+|[.;?!,:])/', $match, $matches)
然而,我现在有这样的话:

  • 换个词
  • 更多像这样的词
我希望能够保留这些单词的完整性,因为它们是相互关联的,但我目前的preg_匹配将它们分解为单个单词

我试过的 及

我从中找到的

但无法实现这一预期结果:

Array ( [0] A, [1] word, [2] like_this, [3] connected, [4] ; ,[5] with-relevant-punctuation)

理想情况下,我还能够解释特殊字符,因为其中一些单词可能有重音

,只需在字符类中插入连字符即可。但请注意,连字符需要出现在字符集的开头或结尾。否则它将被视为范围符号

(\w+|[-.;?!,:])

例子 现场演示

示例文本

However, I now have words like these:

Word-another-word
more_words_like_these

and I would like to be able to retain the integrity of these words as they are (connected) but my current preg_match breaks them down into individual words.
样本匹配

其他单词如前所述被捕获,但带有连字符的单词也被捕获

Omitted Match 1-9 for brevity 

MATCH 10
1.  [39-56] `Word-another-word`

MATCH 11
1.  [57-78] `more_words_like_these`

Omitted Match 12+ for brevity 
解释
你试过
[\w.;?!,:]+
吗?它输入了
一个像这样的单词;与相关标点符号
类似的单词连接;使用相关的标点符号(“/(\S+/”,$match,$match,$matches)
)?…或者甚至是
\S
都可以做到-
preg\u match\u all(“/(\S+)/”,$match,$matches)
感谢您的回答并花时间解释它;非常有用。它起作用了
However, I now have words like these:

Word-another-word
more_words_like_these

and I would like to be able to retain the integrity of these words as they are (connected) but my current preg_match breaks them down into individual words.
Omitted Match 1-9 for brevity 

MATCH 10
1.  [39-56] `Word-another-word`

MATCH 11
1.  [57-78] `more_words_like_these`

Omitted Match 12+ for brevity 
NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    [-.;?!,:]                any character of: '-', '.', ';', '?',
                             '!', ',', ':'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------