TwitterWebScraping空列表Python
我正试图从推特上搜刮一些结果,它把我扔到了错误的下面TwitterWebScraping空列表Python,python,web-scraping,beautifulsoup,twitter,Python,Web Scraping,Beautifulsoup,Twitter,我正试图从推特上搜刮一些结果,它把我扔到了错误的下面 import requests import re from bs4 import BeautifulSoup url = u'https://twitter.com/search?q=' query = u'q=cruise&src=typed_query' r = requests.get(url+query) soup = BeautifulSoup(r.text,'html.parser') tweets = [] f
import requests
import re
from bs4 import BeautifulSoup
url = u'https://twitter.com/search?q='
query = u'q=cruise&src=typed_query'
r = requests.get(url+query)
soup = BeautifulSoup(r.text,'html.parser')
tweets = []
for item in soup.findAll('span',attrs={"class":"css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0"}):
result = [item.get_text(strip=True, separator=" ")]
tweets.append(result.text.encode("utf-8"))
f = open('search.csv', 'w')
f.write(r.text)
'charmap' codec can't encode character '\U0001f602' in position 17391: character maps to <undefined>
当我尝试打印(tweets)时,它会给我一个空列表,而对于f.write(r.text),它会给我以下错误
import requests
import re
from bs4 import BeautifulSoup
url = u'https://twitter.com/search?q='
query = u'q=cruise&src=typed_query'
r = requests.get(url+query)
soup = BeautifulSoup(r.text,'html.parser')
tweets = []
for item in soup.findAll('span',attrs={"class":"css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0"}):
result = [item.get_text(strip=True, separator=" ")]
tweets.append(result.text.encode("utf-8"))
f = open('search.csv', 'w')
f.write(r.text)
'charmap' codec can't encode character '\U0001f602' in position 17391: character maps to <undefined>
“charmap”编解码器无法对17391位置的字符“\U0001f602”进行编码:字符映射到
现代页面最常见的问题是:twitter
使用JavaScript
向HTML
添加元素,但请求
/美化页面
无法运行JavaScript。您可能需要控制真正的web浏览器,它可以运行JavaScript
。或者您应该使用Twitter API
来获取数据,而不必进行刮削。检查print('\U0001f602')
-它会给我表情符号,最终您应该获取r.content
(解码前的字节数),而不是r.text
(解码后的字符串),并将其保存在bytes
模式-打开(…,'wb'))
谢谢。我用Selenium试过,效果很好