Php 用于提取特定URL的正则表达式？_Php_Regex

Php 用于提取特定URL的正则表达式？

php regex

Php 用于提取特定URL的正则表达式？,php,regex,Php,Regex,我已经尽了最大努力，但正则表达式并不是我真正喜欢的东西( 我需要提取以特定文件扩展名结尾的特定URL。例如，我希望能够解析一个大段落并提取以*.txt结尾的所有URL。例如在奥特莱斯大学学习英语，是一门杰出的职业，也是一门艺术http://www.somesite.com/somefolder/blahblah/etc/something.txtiaculis dictum.Quisque nisi neque、Quisque QUISK pellentesque blandit、faucib

我已经尽了最大努力，但正则表达式并不是我真正喜欢的东西(

我需要提取以特定文件扩展名结尾的特定URL。例如，我希望能够解析一个大段落并提取以

*.txt

结尾的所有URL。例如

在奥特莱斯大学学习英语，是一门杰出的职业，也是一门艺术http://www.somesite.com/somefolder/blahblah/etc/something.txtiaculis dictum.Quisque nisi neque、Quisque QUISK pellentesque blandit、faucibus eget nisl

我需要能够承受http://www.somesite.com/somefolder/blahblah/etc/something.txt超出上述段落，但要提取的URL数量将有所不同。它将根据用户输入的内容动态变化。它可以有3个以

*.txt

结尾的链接和3个不以

*.txt

结尾的链接。我只需要提取那些以

*.txt

结尾的内容。有人能给我提供我需要的代码吗？

你可以通过

/（？假设这些都是正确的URL，那么它们里面就不会有任何空格。我们可以利用这一事实使正则表达式变得非常简单：
preg_match_all("/([^ ]+\.(txt|doc))/i", $text, $matches);
//   ([^ ]+     Match anything, except for a space.
//   \.         A normal period.
//   (txt|doc)  The word "txt" or "doc".
//   )/i        Case insensitive (so TXT and TxT also work)

如果不需要匹配多个文件扩展名，则可以将“（txt | doc）”更改为“txt”
$matches
将包含多个数组，您需要0或1号键。要使数组更易于读取，可以使用：
preg_match_all("/(?P<matched_urls>[^ ]+\.(txt|doc))/i", $text, $matches);

很明显，你需要哪把钥匙。
那么：
$str = 'Lorem ipsum dolor sit amet. Donec eu nunc nec nibh http://www.somesite.com/somefolder/blahblah/etc/something.txt. Lorem ipsum dolor sit amet. Donec eu nunc nec nibh http://www.somesite.com/somefolder/blahblah/etc/something.doc.';
preg_match_all('#\b(http://\S+\.txt)\b#', $str, $m);

说明：
#             : regex delimiter
\b            : word boundary
(             : begin capture group
http://       : litteral http://
\S+           : one or more non space
\.            : a dot
txt           : litteral txt
)             : end capture group
\b            : word boundary
#             : regex delimiter

您可以发布到目前为止您已经尝试过的内容。关于URL有什么限制吗？是否包含搜索参数或片段标识符？
#             : regex delimiter
\b            : word boundary
(             : begin capture group
http://       : litteral http://
\S+           : one or more non space
\.            : a dot
txt           : litteral txt
)             : end capture group
\b            : word boundary
#             : regex delimiter