Php 从字符串中提取URL的特定部分_Php_Regex_Html Content Extraction

Php 从字符串中提取URL的特定部分

php regex

Php 从字符串中提取URL的特定部分,php,regex,html-content-extraction,Php,Regex,Html Content Extraction,我只需要用PHP提取URL的一部分，但我正在努力达到提取应该停止的设定点。我使用正则表达式从一个较长的字符串中提取整个URL，如下所示： $regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i'; preg_match_all($regex, $href, $matches); 结果是以下字符串： http://www.cambridgeenglish.org/t

我只需要用PHP提取URL的一部分，但我正在努力达到提取应该停止的设定点。我使用正则表达式从一个较长的字符串中提取整个URL，如下所示：

$regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';
preg_match_all($regex, $href, $matches);

结果是以下字符串：

http://www.cambridgeenglish.org/test-your-english/&amp;sa=U&amp;ei=a4rbU8agB-zY0QWS_IGYDw&amp;ved=0CFEQFjAL&amp;usg=AFQjCNGU4FMUPB2ZuVM45OoqQ39rJbfveg

现在我只想提取这一点。我基本上需要从

开始摆脱一切

有人知道如何做到这一点吗？我需要运行另一个正则表达式还是可以将其添加到初始正则表达式中？

下面的正则表达式将删除字符串

之后的所有内容。你的php代码是

<?php
echo preg_replace('~&amp.*$~', '', 'http://www.cambridgeenglish.org/test-your-english/&amp;sa=U&amp;ei=a4rbU8agB-zY0QWS_IGYDw&amp;ved=0CFEQFjAL&amp;usg=AFQjCNGU4FMUPB2ZuVM45OoqQ39rJbfveg');
?> //=> http://www.cambridgeenglish.org/test-your-english/

/=>http://www.cambridgeenglish.org/test-your-english/

说明：

```
&
```
匹配字符串
```
&
```
```
*
```
与任何字符零次或多次匹配
```
$
```
行尾

<?php
echo preg_replace('~&amp.*$~', '', 'http://www.cambridgeenglish.org/test-your-english/&amp;sa=U&amp;ei=a4rbU8agB-zY0QWS_IGYDw&amp;ved=0CFEQFjAL&amp;usg=AFQjCNGU4FMUPB2ZuVM45OoqQ39rJbfveg');
?> //=> http://www.cambridgeenglish.org/test-your-english/

/=>http://www.cambridgeenglish.org/test-your-english/

说明：

```
&
```
匹配字符串
```
&
```
```
*
```
与任何字符零次或多次匹配
```
$
```
行尾

$parsed = parse_url($url);
$my_url = $parsed['scheme'] . '://' . $parsed['hostname'] . substr($parsed['path'], 0, strpos($parsed['path'],'&amp'));

$parsed = parse_url($url);
$my_url = $parsed['scheme'] . '://' . $parsed['hostname'] . substr($parsed['path'], 0, strpos($parsed['path'],'&amp'));