Web scraping Web Scrapy-如何循环浏览<;中的标题超链接;表格>;或<;表摘要>;标签

Web scraping Web Scrapy-如何循环浏览<;中的标题超链接;表格>;或<;表摘要>;标签,web-scraping,scrapy,Web Scraping,Scrapy,我有一个关于如何循环浏览html选项卡“表单”或“表格摘要”的问题 <!--td class="folder" colspan="2">&nbsp;</td--> <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_tra

我有一个关于如何循环浏览html选项卡“表单”或“表格摘要”的问题

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
地点:参观

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
我尝试了下面的代码,但没有运气

start_urls = ['https://mobile.uwants.com/forumdisplay.php?fid=631'])
def parse(self, response):
  resp =response.xpath("//*[//*[@id='mainbody']/tbody/tr/td/div/table[2]/tbody/tr/td[1]/div[2]/form")
  for r in resp:
    r = response.xpath('//*[contains(@id,"thread_197")]/a/@href').extract_first()
    yield response.follow(r,self.parse_items)
                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
第一幅图是我的初始表,我希望问题能够在每个注释中循环

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
第二张图片是,这是一个样本,我想放弃评论

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
HTML代码

<form method="post" name="moderate" action="topicadmin.php?action=moderate&amp;fid=631">
    <input type="hidden" name="formhash" value="df27712a" />
    <table summary="forum_631"  cellspacing="0" cellpadding="0">
        <thead class="category">
            <tr>
                <td class="folder">&nbsp;</td>
                <td class="icon">&nbsp;</td>
                <th>標題</th>
                <td class="author">作者</td>
                <td class="nums">回覆/查看</td>
                <td class="lastpost">最後發表</td>
            </tr>
        </thead>

                    <tbody>
            <tr>
                <td class="folder"><img src="https://n2.hk/images/default/folder_common.gif" alt="announcement" /></td>
                <td class="icon">&nbsp;</td>
                <th class="tsubject">論壇公告: <a href="http://game.uwants.com/viewthread.php?tid=19414641" target="_blank">開戰準備!全新版區《Gundam Fan Club》開放!</a></th>
                <td class="author">
                    <cite><a href="space.php?action=viewpro&amp;uid=5779750">mhmimi</a></cite>
                    <em>2017-6-9</em>
                </td>
                <td class="nums">-</td>
                <td class="lastpost">-</td>
            </tr>
        </tbody>
                                <!-- Text T4 - Modified by Ivan - start-->
                    <tbody>
            <tr>
                <td colspan="6" height="35"><!-- Ad space:Uwants_Web_630_T4 --><script src="https://lv.l.networld.hk/lview?loc=_adb_20_10002834&callback=crystal2.addStaticSlot"></script>                  

                </td>
            </tr>
        </tbody>
                    <!-- Text T4 - Modified by Ivan - end-->


            <tbody id="stickthread_19434311"  class="forumdisplay_thread" data-tid="19434311">
        <tr>
            <td class="folder"><a href="viewthread.php?tid=19434311&amp;extra=page%3D1&amp;tr_h=18846759255b93707b8382d9_31521831" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/default/folder_lock.gif" /></a></td>
            <td class="icon">
                                &nbsp;                              </td>
            <th class="lock">
                <label>
                                                            <img src="https://n2.hk/images/default/pin_2.gif" alt="分類置頂" />
                                                        <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                                                                                    <span id="thread_19434311" class="tsubject"><a href="viewthread.php?tid=19434311&amp;extra=page%3D1&amp;tr_h=18846759255b93707b8382d9_31521831" style="font-weight: bold;color: red" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')"><!-- google_ad_section_start -->請各會員注意,本版新措施(已生效)<!-- google_ad_section_end --></a></span>
                                                                                                                </th>
            <td class="author">
                <cite>
                                        <a href="space.php?action=viewpro&amp;uid=2923242">Yue33695874  </a>
                                    </cite>
                <em></em>
            </td>
            <td class="nums">
                <strong>0</strong> / <em>41262</em>
                                </td>
            <td class="lastpost">
                <em><a href="redirect.php?tid=19434311&amp;goto=lastpost#lastpost"></a></em>
                <cite>by <a href="space.php?action=viewpro&amp;username=Yue33695874">Yue33695874 </a></cite>
            </td>
        </tr>
    </tbody>                <tbody id="stickthread_16031523"  class="forumdisplay_thread" data-tid="16031523">
        <tr>
            <td class="folder"><a href="viewthread.php?tid=16031523&amp;extra=page%3D1&amp;tr_h=18846759255b93707b8382d9_31521831" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/default/folder_lock.gif" /></a></td>
            <td class="icon">
                                &nbsp;                              </td>
            <th class="lock">
                <label>
                                                            <img src="https://n2.hk/images/default/pin_2.gif" alt="分類置頂" />
                                                        <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                                                                                    <span id="thread_16031523" class="tsubject"><a href="viewthread.php?tid=16031523&amp;extra=page%3D1&amp;tr_h=18846759255b93707b8382d9_31521831" style="font-weight: bold;color: red" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')"><!-- google_ad_section_start -->==手機網絡 版版規== 本版嚴禁一切問價及報價, 違者發帖將被移走及不作通知!<!-- google_ad_section_end --></a></span>
                                                                                                                </th>
            <td class="author">
                <cite>
                                        <a href="space.php?action=viewpro&amp;uid=979277">quimboy1  </a>
                                    </cite>
                <em></em>
            </td>
            <td class="nums">
                <strong>0</strong> / <em>61033</em>
                                </td>
            <td class="lastpost">
                <em><a href="redirect.php?tid=16031523&amp;goto=lastpost#lastpost"></a></em>
                <cite>by <a href="space.php?action=viewpro&amp;username=quimboy1">quimboy1 </a></cite>
            </td>
        </tr>
    </tbody>                <tbody id="stickthread_16776292"  class="forumdisplay_thread" data-tid="16776292">
        <tr>
            <td class="folder"><a href="viewthread.php?tid=16776292&amp;extra=page%3D1&amp;tr_h=18846759255b93707b8382d9_31521831" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/default/folder_lock.gif" /></a></td>
            <td class="icon">
                                &nbsp;                              </td>
            <th class="lock">
                <label>
                                                            <img src="https://n2.hk/images/default/pin_1.gif" alt="本版置頂" />
                                                        <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                                                                                    <span id="thread_16776292" class="tsubject"><a href="viewthread.php?tid=16776292&amp;extra=page%3D1&amp;tr_h=18846759255b93707b8382d9_31521831" style="font-weight: bold;text-decoration: underline;color: purple" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')"><!-- google_ad_section_start -->溫馨提示 : 小心網上流動手提電話公司sales, 已經有騙案個案及已轉交警方處理<!-- google_ad_section_end --></a></span>
                                                                                                                </th>
            <td class="author">
                <cite>
                                        <a href="space.php?action=viewpro&amp;uid=111995">chungsm  </a>
                                    </cite>
                <em></em>
            </td>
            <td class="nums">
                <strong>2</strong> / <em>65809</em>
                                </td>
            <td class="lastpost">
                <em><a href="redirect.php?tid=16776292&amp;goto=lastpost#lastpost"></a></em>
                <cite>by <a href="space.php?action=viewpro&amp;username=chungsm">chungsm </a></cite>
            </td>
        </tr>
    </tbody><!--
    </table>
    <table summary="forum_631" id="forum_631" cellspacing="0" cellpadding="0">
                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>

標題
作者
回覆/查看
最後發表
論壇公告: 
2017-6-9
-
-
0/41262
通过
0/61033
通过
2/65809
通过

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
手機網絡 - 熱門話題

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>


感谢大家的帮助,请尝试更改此行:

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
r = response.xpath('//*[contains(@id,"thread_197")]/a/@href').extract_first()
为此:

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
r = r.xpath('//*[contains(@id,"thread_197")]/a/@href').extract_first()
我想这就是你的本意

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
如果要获取表中包含的所有href,可以执行以下操作:

                            <!--td class="folder" colspan="2">&nbsp;</td-->
            <td class="folder" ><a href="viewthread.php?tid=19782731&amp;extra=page%3D1" onclick="return ga_trackEvent(this,'divert-to-fid-631','click')" title="新窗口打開" target="_blank"><img src="https://n2.hk/images/r09/hot_u.gif" /></a></td>
            <td class="icon">&nbsp;             </td>
                            <th class="" >
                <label>
                                                                            <!-- By Rex Heat Thread -->
                                    <!-- By Rex Heat Thread -->
                &nbsp;</label>
                                                        <span id="thread_ht_1_19782731" class="tsubject"><a href="viewthread.php?tid=19782731&amp;extra=page%3D1"  onclick="return ga_trackEvent(this,'divert-to-fid-631','click')">問 : 中國移動4.5G 網絡 地鐵接收如何</a></span>
                                                                                                                        <a href="redirect.php?tid=19782731&amp;goto=newpost#newpost"  class="new">New</a>
response.css('.tsubject a::attr(href)').extract()