Python 抓取youtube用户信息_Python_Youtube_Gdata

Python 抓取youtube用户信息

python youtube

Python 抓取youtube用户信息,python,youtube,gdata,Python,Youtube,Gdata,我正在尝试对Youtube进行爬网，以检索关于一组用户（大约200人）的信息。我对寻找用户之间的关系感兴趣：接触订户订阅他们评论了哪些视频等我已设法通过以下来源获得联系信息： import gdata.youtube import gdata.youtube.service from gdata.service import RequestError from pub_author import KEY, NAME_REGEX def get_details(name):

我正在尝试对Youtube进行爬网，以检索关于一组用户（大约200人）的信息。我对寻找用户之间的关系感兴趣：

接触
订户
订阅
他们评论了哪些视频
等

我已设法通过以下来源获得联系信息：

import gdata.youtube
import gdata.youtube.service
from gdata.service import RequestError
from pub_author import KEY, NAME_REGEX
def get_details(name):
    yt_service = gdata.youtube.service.YouTubeService()
    yt_service.developer_key = KEY
    contact_feed = yt_service.GetYouTubeContactFeed(username=name)
    contacts = [ e.title.text for e in contact_feed.entry ]
    return contacts

我似乎无法得到我需要的其他信息。上面说我可以从中获取XML提要（对于某些任意用户）。但是，如果我尝试获取其他用户的订阅，则会出现403错误，并显示以下消息：

用户必须登录才能访问这些订阅

如果我使用GDataAPI：

sub_feed = yt_service.GetYouTubeSubscriptionFeed(username=name)
sub = [ e.title.text for e in contact_feed.entry ]

然后我得到同样的错误

不登录如何获取这些订阅？这应该是可能的，因为您无需登录Youtube网站即可访问此信息

而且，似乎没有特定用户的订阅源。这些信息是否可以通过API获得

编辑

因此，这似乎无法通过API实现。我必须用一种又快又脏的方式：

for f in `cat users.txt`; do wget "www.youtube.com/profile?user=$f&view=subscriptions" --output-document subscriptions/$f.html; done

然后使用此脚本从下载的HTML文件中获取用户名：

"""Extract usernames from a Youtube profile using regex"""
import re
def main():
    import sys
    lines = open(sys.argv[1]).read().split('\n')
    #
    # The html files has two <a href="..."> tags for each user: once for an 
    # image thumbnail, and once for a text link.
    # 
    users = set()
    for l in lines:
        match = re.search('<a href="/user/(?P<name>[^"]+)" onmousedown', l)
        if match:
            users.add(match.group('name'))
    users = list(users)
    users.sort()
    print users
if __name__ == '__main__':
    main()

“使用regex从Youtube配置文件中提取用户名”
进口稀土
def main（）：
导入系统
lines=open（sys.argv[1]）.read（）.split（'\n'）
#
#html文件对每个用户有两个标记：一个用户一次
#图像缩略图，一次用于文本链接。
# 
users=set（）
对于l in行：
match=re.search（'为了在用户未登录的情况下访问用户的订阅源，用户必须选中其下的“订阅频道”复选框
目前，没有通过gdata
API直接获得频道订户的方法。事实上，有一项针对该频道的突出功能请求已经开放了3年多！请参阅