php preg_拆分文本时没有松散、:等等
我试着用preg_split分割文本,但是我没有得到它的regrex 例如:php preg_拆分文本时没有松散、:等等,php,regex,pcre,preg-split,Php,Regex,Pcre,Preg Split,我试着用preg_split分割文本,但是我没有得到它的regrex 例如: I search 1, regex to: no. Or... yes! 应获得: Array ( [0] => I [1] => search [2] => 1 [3] => , [4] => regex [5] => to [6] => : [7] => no [8] => .
I search 1, regex to: no. Or... yes!
应获得:
Array
(
[0] => I
[1] => search
[2] => 1
[3] => ,
[4] => regex
[5] => to
[6] => :
[7] => no
[8] => .
[9] => Or
[10] => ...
[11] => yes
[12] => !
)
array (
0 => 'I',
1 => 'search',
2 => '1',
3 => ',',
4 => 'regex',
5 => '(',
6 => 'regular',
7 => 'expression',
8 => ')',
9 => 'to',
10 => ':',
11 => 'That',
12 => '\'s',
13 => 'it',
14 => 'is',
15 => '!',
16 => 'Und',
17 => 'über',
18 => 'den',
19 => 'Wolken',
20 => 'müssen',
21 => 'wir',
22 => '...',
)
我尝试了以下代码:
preg_split("/([\s]+)/", "I search 1, regex to: no. Or... yes!")
其最终目的是:
Array
(
[0] => I
[1] => search
[2] => 1,
[3] => regex
[4] => to:
[5] => no.
[6] => Or...
[7] => yes!
)
编辑:好的,原来的问题已经解决了,但我在示例中忘记了一些东西:
新例子:
I search 1, regex (regular expression) to: That's it is! Und über den Wolken müssen wir...
应获得:
Array
(
[0] => I
[1] => search
[2] => 1
[3] => ,
[4] => regex
[5] => to
[6] => :
[7] => no
[8] => .
[9] => Or
[10] => ...
[11] => yes
[12] => !
)
array (
0 => 'I',
1 => 'search',
2 => '1',
3 => ',',
4 => 'regex',
5 => '(',
6 => 'regular',
7 => 'expression',
8 => ')',
9 => 'to',
10 => ':',
11 => 'That',
12 => '\'s',
13 => 'it',
14 => 'is',
15 => '!',
16 => 'Und',
17 => 'über',
18 => 'den',
19 => 'Wolken',
20 => 'müssen',
21 => 'wir',
22 => '...',
)
一件事是,第一个解决方案中的开头(get)不匹配。另一件事是,单词内部的德语字符也不匹配
希望可以更新问题(而不是打开新问题)
我的答案如下,但不匹配:
\s+|(?<!(A-Za-z1-0ÄÖÜäöüß)+)(?=(A-Za-z1-0ÄÖÜäöüß)+)
\s+|(?您可以使用此基于前瞻的正则表达式:
$str = 'I search 1, regex to: no. Or... yes!';
$tok = preg_split('/\h+|(?<!\W)(?=\W)/', $str);
print_r($tok);
Array
(
[0] => I
[1] => search
[2] => 1
[3] => ,
[4] => regex
[5] => to
[6] => :
[7] => no
[8] => .
[9] => Or
[10] => ...
[11] => yes
[12] => !
)
$str='I search 1,regex to:no.或…yes!';
$tok=preg_split('/\h+|)(?我认为除了您似乎想要作为一个整体的位之外–这对我来说没有多大意义,因为对于其他标点符号字符,例如!
或,
您需要单独的部分–您可以通过简单地在任何空格或单词边界处拆分来实现
preg_split(
'#\s|\b#u',
"I search 1, regex (regular expression) to: That's it is! Und über den Wolken müssen wir...",
-1,
PREG_SPLIT_NO_EMPTY
);
我试过了,但是你现在用哪种代码得到了什么?对不起,我不知道如何使积极的前瞻性工作。我只有“/[\s]+/”工作,这就排除了所有问题:(这是一个开始!在你的问题中包含你当前的代码,这样我们就可以看到你尝试了什么,我们可以更好地向你展示你犯错误的地方。我在上做了一个简单的测试脚本:你的代码应该在问题中,而不是一个场外链接。也许你想解释一下你的正则表达式的功能。我刚刚在我的回答中添加了它。不客气。我我已经添加了一些关于lookaheads的详细信息。好吧,有一件事我忘记了——一条带有(和)的短信。如果我能找到解决方案,我会自己尝试。但如果没有,我会添加一条新的评论:)好的,我不明白:(希望有人能看看我更新的问题。谢谢-现在可以用“我可以生活:”)