Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/spring/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Web scraping 黑客新闻:如何提取评论层次结构_Web Scraping - Fatal编程技术网

Web scraping 黑客新闻:如何提取评论层次结构

Web scraping 黑客新闻:如何提取评论层次结构,web-scraping,Web Scraping,我试图解析论坛news.ycombinator.com上的一条评论线索。然而,在查看html之后,似乎没有嵌套注释的层次结构。这将使解析变得非常困难。例如,下面是父注释及其子注释: <!-- This part below draws the upvote/downvote images --> <table border=0><tr><td><table border=0><tr><td><img src

我试图解析论坛news.ycombinator.com上的一条评论线索。然而,在查看html之后,似乎没有嵌套注释的层次结构。这将使解析变得非常困难。例如,下面是父注释及其子注释:

<!-- This part below draws the upvote/downvote images -->
<table border=0><tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td><td valign=top><center><a id=up_4241971 href="vote?for=4241971&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4241971></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; ">


<!-- This part below is user/time and permalink info for a parent comment -->
<span class="comhead"><a href="user?id=JshWright">JshWright</a> 7 hours ago  | <a href="item?id=4241971">link</a></span></div><br>


<!-- This part below is actual Comment -->
<span class="comment"><font color=#000000>I just got my Verizon Galaxy S3, and ordered the 20-pack of NFC tags offered by <a href="http://tagsfordroid.com" rel="nofollow">http://tagsfordroid.com</a><p>I think I know what my Dad felt like when he got his first label printer... Within days it seemed like every object in his office was labeled...<p>I've got a tag in my car to automatically send my wife a "Headed home" SMS, a tag on my night stand to toggle between 'night' (silent) and 'day' (loud) volume settings, a tag by my back door to launch CardioTrainer when I go out for a run (this one may have crossed the "I've run out of ideas" line...). I'm using the keychain tag to dial a response number for the fire department I'm a member of.</font></span><p><font size=1><u><a href="reply?id=4241971&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr>


<!-- This part below is upvote/downvote arrow for child of parent -->
<tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td><td valign=top><center><a id=up_4242025 href="vote?for=4242025&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4242025></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; ">

<!-- This part has user/time/permalink for child comment -->
<span class="comhead"><a href="user?id=msbmsb">msbmsb</a> 7 hours ago  | <a href="item?id=4242025">link</a></span></div><br>

<!-- This part is the content of the  child comment -->
<span class="comment"><font color=#000000>I did the same thing. Tag next to the entry-way light switch for changing to an "at-home" profile, tag next to the bed for switching between night mode and morning mode, tag at work, keychain tag for switching between car mode and quiet mode.<p>And profile switching is just the basics. You can have a tag that connects guests' NFC-enabled phones to your wifi without having to hand out the password, for instance.<p>NFC task launcher + tasker is an amazing combination that opens up all kinds of possibilities.</font></span><p><font size=1><u><a href="reply?id=4242025&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr><tr><td>

7小时前|
我刚买了Verizon Galaxy S3,订购了20包NFC标签,由提供。我想我知道我爸爸买第一台标签打印机时的感受。。。几天之内,他办公室里的每件物品都被贴上了标签……我的车上有一个标签,可以自动向我妻子发送“回家”短信;我的床头柜上有一个标签,可以在“夜间”(无声)和“白天”(大声)音量设置之间切换;我的后门有一个标签,可以在我出去跑步时启动心脏训练器(这一个可能已经越过了边界)“我没有主意了”行…。我正在使用钥匙链标签拨打我所属消防队的响应号码。 7小时前|
我也做了同样的事情。在入口通道灯开关旁边贴上标签,以便改为“在家”“配置文件,床边的标签用于在夜间模式和早晨模式之间切换,工作时的标签,钥匙链标签用于在汽车模式和安静模式之间切换。而配置文件切换只是基础。例如,您可以使用一个标签,将客人的支持NFC的手机连接到您的wifi,而无需提供密码。NFC task launcher+tasker是一个神奇的组合,它开启了各种可能性。

那么,hacker news如何存储评论的层次结构,以及我在抓取他们的数据时如何复制它?

在表中,缩进是通过图像标记完成的:

...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td>...
...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td>...
。。。。。。
......

大概你会阅读并解析这些。通过保持
width
值的内部堆栈,可以重建所表示的实际线程。

Wow!我错过了。谢谢。