如何使用python在scrapy中迭代XML子节点？_Python_Scrapy_Instagram_Screen Scraping

如何使用python在scrapy中迭代XML子节点？

python scrapy instagram

如何使用python在scrapy中迭代XML子节点？,python,scrapy,instagram,screen-scraping,Python,Scrapy,Instagram,Screen Scraping,我想在上面刮注释，但我似乎不知道如何遍历封装注释的节点的子节点并获取数据点这是hmtl的一部分： #跟在后面我认为您的问题来自“注释”的xpath。通过仅获取文本，您没有选择节点。以下更改使其适用于我： <div class="comment"> <div class="comment-user"> <div class="comment-user-avatar">

我想在上面刮注释，但我似乎不知道如何遍历封装注释的节点的子节点并获取数据点

这是hmtl的一部分：


#跟在后面
我认为您的问题来自“注释”的xpath。通过仅获取文本，您没有选择节点。
以下更改使其适用于我：
        <div class="comment">
            <div class="comment-user">
                <div class="comment-user-avatar">
                    <a href="https://www.picuki.com/profile/alexandera_300">
                        <img src="https://scontent-yyz1-1.cdninstagram.com/v/t51.2885-19/s150x150/98342975_2815537605343770_6875611169034338304_n.jpg?_nc_ht=scontent-yyz1-1.cdninstagram.com&amp;_nc_ohc=VjMtcOxXuaQAX_ZCqee&amp;oh=4cf78fecbadcb57a81672c6edecc15a2&amp;oe=5F02D580" alt="alexandera_300">
                    </a>
                </div>
                <div class="comment-user-nickname">
                    <a href="https://www.picuki.com/profile/alexandera_300">@alexandera_300</a>
                </div>
            </div>
            <div class="comment-text">
                #followforfollowback
            </div>
        </div>
        <div class="comment">
            <div class="comment-user">
                <div class="comment-user-avatar">
                    <a href="https://www.picuki.com/profile/coxlogan2008">
                        <img src="https://scontent-yyz1-1.cdninstagram.com/v/t51.2885-19/s150x150/101229634_275138197009045_1475918829270859776_n.jpg?_nc_ht=scontent-yyz1-1.cdninstagram.com&amp;_nc_ohc=e4gTZqQGpEAAX_7U-Q0&amp;oh=36b7f5d1a0d7069f2447f4a318edec7d&amp;oe=5F004A54" alt="coxlogan2008">
                    </a>
                </div>
                <div class="comment-user-nickname">
                    <a href="https://www.picuki.com/profile/coxlogan2008">@coxlogan2008</a>
                </div>
            </div>
            <div class="comment-text">
                I think your issue comes from your xpath for 'comments'. By taking only the text, you're not selecting the nodes.
The following changes make it work for me:

# the likes & number of comments only have to be taken once, should not be part of the loop
likes = response.xpath('.//span[@class="icon-thumbs-up-alt"]/text()').get()
num_of_comments = response.xpath('.//span[@id="commentsCount"]/text()').get()
comments = response.xpath('//div[@id="commantsPlace"]/*[@class="comment"]')
for comment in comments:  
    comment_user_name = comment.xpath('.//*[@class="comment-user-nickname"]/a/text()').get()
    comment_text = comment.xpath('.//*[@class="comment-text"]/text()').get()