Regex 图案不';不要删除网站上的特殊字符

Regex 图案不';不要删除网站上的特殊字符,regex,perl,Regex,Perl,因此,我目前正在以URL的形式获取用户输入,并对其进行解析,然后打印网站链接到的其他页面。我使用的软件包是: LWP::Simple 我从命令行使用用户输入获取链接,并将其存储在变量中。我使用$ARGV[0]获取它。 然后我继续创建另一个变量,并在存储网站的变量上使用$get。 然后,我继续创建一个数组变量,并对该变量应用正则表达式 /\shref="?([^\s>"]+)/gi; 它存储了对包含网站字符串的变量使用的get函数的结果。然后我在数组上做了一个foreach循环来打印结果

因此,我目前正在以URL的形式获取用户输入,并对其进行解析,然后打印网站链接到的其他页面。我使用的软件包是:

LWP::Simple
我从命令行使用用户输入获取链接,并将其存储在变量中。我使用$ARGV[0]获取它。 然后我继续创建另一个变量,并在存储网站的变量上使用$get。 然后,我继续创建一个数组变量,并对该变量应用正则表达式

/\shref="?([^\s>"]+)/gi;
它存储了对包含网站字符串的变量使用的get函数的结果。然后我在数组上做了一个foreach循环来打印结果

然而,虽然它可以打印链接和其他内容,但如果后面没有任何内容,它也只能打印独立的特殊字符,如
/
#


所以,如果有类似于
/blabalbla
的东西,它就会打印出来。但如果只有独立的特殊字符,如
/
\
\
,它也会打印它们。我可以通过任何方式修改正则表达式,这样如果特殊字符不在字符串后面,它们就不应该打印。在学习perl方面是新手,在regex方面不是很有天赋,如果没有进一步的信息,我无法帮助您解决具体的问题,但同时我建议您看看是为了这个目的编写的

下面是一个示例代码及其输出。它只列出具有
href
属性的
元素

use strict;
use warnings;
use 5.010;

use LWP;
use HTML::LinkExtor;

my $ua = LWP::UserAgent->new;
my $resp = $ua->get('http://www.bbc.co.uk/');

my $extor = HTML::LinkExtor->new(undef, $resp->base);
$extor->parse($resp->decoded_content);

for my $link ($extor->links) {
  my ($tag, %attr) = @$link;
  next unless $tag eq 'a' and $attr{href};
  say $attr{href};
}
输出

http://m.bbc.co.uk
http://www.bbc.co.uk/
http://www.bbc.co.uk/#h4discoveryzone
http://www.bbc.co.uk/accessibility/
https://ssl.bbc.co.uk/id/status
http://www.bbc.co.uk/news/
http://www.bbc.com/news/
http://www.bbc.co.uk/sport/
http://www.bbc.co.uk/weather/
http://shop.bbc.com/
http://www.bbc.com/earth/
http://www.bbc.com/travel/
http://www.bbc.com/capital/
http://www.bbc.co.uk/iplayer/
http://www.bbc.com/culture/
http://www.bbc.com/autos/
http://www.bbc.com/future/
http://www.bbc.co.uk/tv/
http://www.bbc.co.uk/radio/
http://www.bbc.co.uk/cbbc/
http://www.bbc.co.uk/cbeebies/
http://www.bbc.co.uk/arts/
http://www.bbc.co.uk/ww1/
http://www.bbc.co.uk/food/
http://www.bbc.co.uk/history/
http://www.bbc.co.uk/learning/
http://www.bbc.co.uk/music/
http://www.bbc.co.uk/science/
http://www.bbc.co.uk/nature/
http://www.bbc.com/earth/
http://www.bbc.co.uk/local/
http://www.bbc.co.uk/travel/
http://www.bbc.co.uk/a-z/
http://www.bbc.co.uk/#orb-footer
http://search.bbc.co.uk/search
http://www.bbc.co.uk/privacy/cookies/managing/cookie-settings.html
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F
http://www.bbc.co.uk/#
http://www.bbc.co.uk/#
http://www.bbc.co.uk/weather/2643743?day=0
http://www.bbc.co.uk/weather/2643743?day=0
http://www.bbc.co.uk/weather/2643743?day=1
http://www.bbc.co.uk/weather/2643743?day=1
http://www.bbc.co.uk/weather/2643743?day=2
http://www.bbc.co.uk/weather/2643743?day=2
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F
http://www.bbc.co.uk/weather/2643743
http://www.bbc.co.uk/news/science-environment-30311816
http://www.bbc.co.uk/news/science-environment-30311822
http://www.bbc.co.uk/news/science-environment-30311818
http://www.bbc.co.uk/news/magazine-30282261
http://www.bbc.co.uk/news/science-environment-30311816
http://www.bbc.co.uk/news/uk-politics-30291460
http://www.bbc.co.uk/news/
http://www.bbc.co.uk/news/uk-england-kent-30319549
http://www.bbc.co.uk/news/world-europe-30306106
http://www.bbc.co.uk/news/world-europe-30306992
http://www.bbc.co.uk/news/uk-30306145
http://www.bbc.co.uk/news/local/
http://www.bbc.co.uk/news/england/london/
http://www.bbc.co.uk/news/uk-england-london-30308694
http://www.bbc.co.uk/news/uk-england-london-30315650
http://www.bbc.co.uk/news/uk-england-london-30321504
http://www.bbc.co.uk/sport/live/football/29959148
http://www.bbc.co.uk/sport/0/
http://www.bbc.co.uk/sport/live/snooker/29618359
http://www.bbc.co.uk/sport/football/30204433
http://www.bbc.co.uk/sport/cricket/30308980
http://www.bbc.co.uk/sport/football/30204434
http://www.bbc.co.uk/sport/0/football/
http://www.bbc.co.uk/sport/football/30204459
http://www.bbc.co.uk/sport/football/30204511
http://www.bbc.co.uk/sport/football/28647040
http://www.bbc.co.uk/?dzf=sport
http://www.bbc.co.uk/?dzf=entertainment
http://www.bbc.co.uk/?dzf=bbcnow
http://www.bbc.co.uk/?dzf=entertainment
http://www.bbc.co.uk/?dzf=news
http://www.bbc.co.uk/?dzf=lifestyle
http://www.bbc.co.uk/?dzf=knowledge
http://www.bbc.co.uk/?dzf=sport
http://www.bbc.co.uk/news/
http://www.bbc.com/news/
http://www.bbc.co.uk/sport/
http://www.bbc.co.uk/weather/
http://shop.bbc.com/
http://www.bbc.com/earth/
http://www.bbc.com/travel/
http://www.bbc.com/capital/
http://www.bbc.co.uk/iplayer/
http://www.bbc.com/culture/
http://www.bbc.com/autos/
http://www.bbc.com/future/
http://www.bbc.co.uk/tv/
http://www.bbc.co.uk/radio/
http://www.bbc.co.uk/cbbc/
http://www.bbc.co.uk/cbeebies/
http://www.bbc.co.uk/arts/
http://www.bbc.co.uk/ww1/
http://www.bbc.co.uk/food/
http://www.bbc.co.uk/history/
http://www.bbc.co.uk/learning/
http://www.bbc.co.uk/music/
http://www.bbc.co.uk/science/
http://www.bbc.co.uk/nature/
http://www.bbc.com/earth/
http://www.bbc.co.uk/local/
http://www.bbc.co.uk/travel/
http://www.bbc.co.uk/a-z/
http://www.bbc.co.uk/
http://www.bbc.co.uk/terms/
http://www.bbc.co.uk/aboutthebbc/
http://www.bbc.co.uk/privacy/
http://www.bbc.co.uk/privacy/cookies/about
http://www.bbc.co.uk/accessibility/
http://www.bbc.co.uk/guidance/
http://www.bbc.co.uk/contact/
http://www.bbc.co.uk/bbctrust/
http://www.bbc.co.uk/complaints/
http://www.bbc.co.uk/help/web/links/

除非您展示您的代码、URL的真实示例和相应的输出,否则我无法提供帮助。你的正则表达式肯定不匹配像那样的孤立字符,我认为你更可能是误用了正则表达式。“跟在字符串后面”是什么意思是???@Borodin-在这里\有更多的链接作为输出,但我删除了它们以适应评论。那就是使用google.com。请参阅end@ikegami-所以如果有这样的链接,它应该打印出来,但如果只是说正斜杠或反斜杠字符,它不应该打印这些字符。所以这个特殊字符后面必须跟一个字符串才能打印出来。这就是我想让我的正则表达式做的谢谢你,但它不是那么可读。请编辑您的问题,包括您的程序和输入的URL。谢谢,我知道我可以一直依靠您来解决我的问题。我在上面还评论了更多的细节,很抱歉我一开始就漏掉了:)如何使用你的来获取用户输入?就像用户必须通过themselves@user2128074:按照通常的方式:使用
chomp(my$URL=)
从终端获取URL,然后在
my$resp=$ua->get($URL)
中使用它。您是否对让您的原始程序正常运行不感兴趣?我相信理解Perl正则表达式对您很有帮助,而且我相当肯定问题出在您的Perl代码中,而不是正则表达式中,如果您愿意展示的话。这是一个课堂作业,我不想破坏我的代码,因为老师们也会看堆栈溢出。有没有一个更私人的平台,我们可以在它不公开的地方交谈?我会毫不犹豫地向你展示代码:)@user2128074我相信你的老师不会介意你在解决作业中遇到的特殊困难时得到帮助。他或她会介意的是,如果您从web上的某个地方逐字复制一个解决方案,并将其作为自己的工作提交。你可以通过发布你的原始代码来避免这种情况,说出你遇到的具体问题,并在问题中指出这是一项任务,因此你不需要完整的解决方案,只需要在正确的方向上提供一些帮助。