关于regex/ruby的帮助

关于regex/ruby的帮助,ruby,regex,rubular,Ruby,Regex,Rubular,嘿,伙计们,我正在制作一个脚本来记录这个站点()的单词/结果,所以我已经准备好了http请求post,等等 我现在唯一需要的就是提取单词,所以我正在使用一个html源代码,看起来是这样的: <html> <head> <title>Text Twist Unscrambler</title> <META NAME="keywords" CONTENT="Text,Twist,Text Twist,Unscramble,Free,Source,

嘿,伙计们,我正在制作一个脚本来记录这个站点()的单词/结果,所以我已经准备好了http请求post,等等

我现在唯一需要的就是提取单词,所以我正在使用一个html源代码,看起来是这样的:

<html>
<head>
<title>Text Twist Unscrambler</title>
<META NAME="keywords" CONTENT="Text,Twist,Text Twist,Unscramble,Free,Source,php">
</head>
<body>

<font face="arial,helvetica" size="3">
<p>
<b>3 letter words</b><br>sae &nbsp; sac &nbsp; ess &nbsp; aas &nbsp; ass &nbsp; sea &nbsp; ace &nbsp; sec &nbsp; <p>

<b>4 letter words</b><br>cess &nbsp; secs &nbsp; seas &nbsp; ceca &nbsp; sacs &nbsp; case &nbsp; asea &nbsp; casa &nbsp; aces &nbsp; caca &nbsp; <p>

<b>5 letter words</b><br>cacas &nbsp; casas &nbsp; caeca &nbsp; cases &nbsp; <p>
<b>6 letter words</b><br>access &nbsp; <br><br>
Found 23 words in 0.22962 seconds


<form action="texttwist.php" method="post">

enter scrambled letters and I'll return all word combinations<br>
<input type="text" name="l" value="asceacas" size="20" maxlength="20">

<input type="submit" name="button" value="unscramble">
<input type="button" name="clear" value="clear" onClick="this.form.l.value='';">
</form><p>

<a href=texttwist.phps>php source</a>
- it's kinda ugly, but it's fast<p>

<a href=/>back to my page</a>

</body>

</html>
a = line.split(/<br>/)[1] # the second half
a.gsub!('<p>', '') # take out the trailing <p>
res = a.split(' &nbsp; ')# this is your data

文本扭曲解读器

3个字母单词
sae sac ess aas ass sea ace sec 4个字母单词
cess secs seas ceca sacs case asea casa aces caca 5个字母单词
cacas casas caeca病例 6个字母单词
访问

在0.22962秒内找到23个单词 输入加扰字母,我将返回所有单词组合
-有点难看,但速度很快
我正在努力寻找像“sae”、“sav”、“secs”、“seas”、“casas”等词

有什么帮助吗

这是我走得最远的一次,我不知道从这里该怎么办


有什么建议吗?帮助?

使用HTML解析器,如。

如果您想要任何类型的健壮性,您真的需要一个解析器,正如Adrian所提到的,是最流行的解决方案

如果您坚持,请注意,随着页面变得越来越复杂,您可能会遇到以下问题:

搜索匹配的行

/^<b>\d+ letter words/
/^\d+字母单词/
然后你可以像这样挖出这些碎片:

<html>
<head>
<title>Text Twist Unscrambler</title>
<META NAME="keywords" CONTENT="Text,Twist,Text Twist,Unscramble,Free,Source,php">
</head>
<body>

<font face="arial,helvetica" size="3">
<p>
<b>3 letter words</b><br>sae &nbsp; sac &nbsp; ess &nbsp; aas &nbsp; ass &nbsp; sea &nbsp; ace &nbsp; sec &nbsp; <p>

<b>4 letter words</b><br>cess &nbsp; secs &nbsp; seas &nbsp; ceca &nbsp; sacs &nbsp; case &nbsp; asea &nbsp; casa &nbsp; aces &nbsp; caca &nbsp; <p>

<b>5 letter words</b><br>cacas &nbsp; casas &nbsp; caeca &nbsp; cases &nbsp; <p>
<b>6 letter words</b><br>access &nbsp; <br><br>
Found 23 words in 0.22962 seconds


<form action="texttwist.php" method="post">

enter scrambled letters and I'll return all word combinations<br>
<input type="text" name="l" value="asceacas" size="20" maxlength="20">

<input type="submit" name="button" value="unscramble">
<input type="button" name="clear" value="clear" onClick="this.form.l.value='';">
</form><p>

<a href=texttwist.phps>php source</a>
- it's kinda ugly, but it's fast<p>

<a href=/>back to my page</a>

</body>

</html>
a = line.split(/<br>/)[1] # the second half
a.gsub!('<p>', '') # take out the trailing <p>
res = a.split(' &nbsp; ')# this is your data
a=line.split(/
/)[1]#下半部分 a、 gsub!(“”,“)#取出拖尾 res=a.split(“”)#这是您的数据

也就是说,这不是您想要的产品代码。您会惊讶于学习解析器将如何改变您对这个问题的看法

你需要看看这个问题: