如何在使用python进行web抓取时解码[email\xa0protected]_Python_Web Scraping

如何在使用python进行web抓取时解码[email\xa0protected]

python web-scraping

如何在使用python进行web抓取时解码[email\xa0protected],python,web-scraping,Python,Web Scraping,当我试图使用python lxml.html从下面的标记中提取邮件id时，它会显示[email\xa0protected]，任何人都可以帮我解码 <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4420366a373021283e2136042921202d27212a30262520212a6a272b29">[email protected]</a>

当我试图使用python lxml.html从下面的标记中提取邮件id时，它会显示[email\xa0protected]，任何人都可以帮我解码

<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4420366a373021283e2136042921202d27212a30262520212a6a272b29">[email&#160;protected]</a>

最后我得到了答案：

fp = '4420366a373021283e2136042921202d27212a30262520212a6a272b29' # taken from data-cfemail html attribut which holds encrypted email

    def deCFEmail(fp):
        try:
            r = int(fp[:2],16)
            email = ''.join([chr(int(fp[i:i+2], 16) ^ r) for i in range(2, len(fp), 2)])
            return email
        except (ValueError):
            pass

使用上述代码，我们可以将CloudFare的base58值解码为文本

例如：

s = '4420366a373021283e2136042921202d27212a30262520212a6a272b29'

print(deCFEmail(s))

我认为这是CloudFare反刮削电子邮件保护正在发挥作用。应该没有一个简单的方法来做你想做的事情。@rodrigo，谢谢你的评论，我得到了答案，请在下面找到