Php 正则表达式,用于在保留标点符号的同时由连字符和下划线连接的单词
我一直在阅读、搜索和试用不同的方法来编写正则表达式,比如p{L}、[a-z]和\w,但我似乎没有得到我想要的结果 问题 我有一个由带标点符号的完整句子组成的数组,我正在使用以下pre_匹配通过一个数组进行解析,该数组在保留单词和标点符号方面效果很好Php 正则表达式,用于在保留标点符号的同时由连字符和下划线连接的单词,php,regex,Php,Regex,我一直在阅读、搜索和试用不同的方法来编写正则表达式,比如p{L}、[a-z]和\w,但我似乎没有得到我想要的结果 问题 我有一个由带标点符号的完整句子组成的数组,我正在使用以下pre_匹配通过一个数组进行解析,该数组在保留单词和标点符号方面效果很好 preg_match_all('/(\w+|[.;?!,:])/', $match, $matches) 然而,我现在有这样的话: 换个词 更多像这样的词 我希望能够保留这些单词的完整性,因为它们是相互关联的,但我目前的preg_匹配将它们分解
preg_match_all('/(\w+|[.;?!,:])/', $match, $matches)
然而,我现在有这样的话:
- 换个词
- 更多像这样的词
Array ( [0] A, [1] word, [2] like_this, [3] connected, [4] ; ,[5] with-relevant-punctuation)
理想情况下,我还能够解释特殊字符,因为其中一些单词可能有重音,只需在字符类中插入连字符即可。但请注意,连字符需要出现在字符集的开头或结尾。否则它将被视为范围符号
(\w+|[-.;?!,:])
例子
现场演示
示例文本
However, I now have words like these:
Word-another-word
more_words_like_these
and I would like to be able to retain the integrity of these words as they are (connected) but my current preg_match breaks them down into individual words.
样本匹配
其他单词如前所述被捕获,但带有连字符的单词也被捕获
Omitted Match 1-9 for brevity
MATCH 10
1. [39-56] `Word-another-word`
MATCH 11
1. [57-78] `more_words_like_these`
Omitted Match 12+ for brevity
解释
你试过
[\w.;?!,:]+
吗?它输入了一个像这样的单词;与相关标点符号
或类似的单词连接;使用相关的标点符号(“/(\S+/”,$match,$match,$matches)
)?…或者甚至是\S
都可以做到-preg\u match\u all(“/(\S+)/”,$match,$matches)
感谢您的回答并花时间解释它;非常有用。它起作用了
However, I now have words like these:
Word-another-word
more_words_like_these
and I would like to be able to retain the integrity of these words as they are (connected) but my current preg_match breaks them down into individual words.
Omitted Match 1-9 for brevity
MATCH 10
1. [39-56] `Word-another-word`
MATCH 11
1. [57-78] `more_words_like_these`
Omitted Match 12+ for brevity
NODE EXPLANATION
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[-.;?!,:] any character of: '-', '.', ';', '?',
'!', ',', ':'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------