Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/svn/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Web scraping 为什么是b'';是否包含在网页抓取后的excel文件中?_Web Scraping - Fatal编程技术网

Web scraping 为什么是b'';是否包含在网页抓取后的excel文件中?

Web scraping 为什么是b'';是否包含在网页抓取后的excel文件中?,web-scraping,Web Scraping,我正在学习网络抓取,能够将数据从网站抓取到excel文件中。但是,在excel文件中,您可以看到它还包含b“”,而不仅仅是字符串(Youtube频道名称、上载、视图)。知道这是从哪里来的吗 这是由您的编码方式造成的-您最好在打开文件时定义一次: file = open('topyoutubers.csv', 'w', encoding='utf-8') 新代码 from bs4 import BeautifulSoup import csv import requests headers

我正在学习网络抓取,能够将数据从网站抓取到excel文件中。但是,在excel文件中,您可以看到它还包含b“”,而不仅仅是字符串(Youtube频道名称、上载、视图)。知道这是从哪里来的吗


这是由您的
编码方式造成的-您最好在打开文件时定义一次:

file = open('topyoutubers.csv', 'w',  encoding='utf-8')
新代码

from bs4 import BeautifulSoup
import csv
import requests


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'} # Need to use this otherwise it returns error 403. 
url = requests.get('https://socialblade.com/youtube/top/50/mostviewed', headers=headers)
#print(url)

soup = BeautifulSoup(url.text, 'lxml')
rows = soup.find('div', attrs = {'style': 'float: right; width: 900px;'}).find_all('div', recursive = False)[4:] # If in the inspect of the website, it uses class, then instead of 'style", type in '_class = ' instead. We don't need the first 4 rows, so [4:]

file = open('/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/My_Projects/Web_scraping/topyoutubers.csv', 'w',  encoding='utf-8')
writer = csv.writer(file)

# write header rows

writer.writerow(['Username', 'Uploads', 'Views'])


for row in rows:
    username = row.find('a').text.strip()
    numbers = row.find_all('span', attrs = {'style': 'color:#555;'})
    uploads = numbers[0].text.strip()
    views = numbers[1].text.strip()

    print(username + ' ' + uploads + ' ' + views)
    writer.writerow([username, uploads, views])

file.close()
输出

    Username                    Uploads     Views
1   T-Series                    15,029      143,032,749,708
2   Cocomelon - Nursery Rhymes  605         93,057,513,422
3   SET India                   48,505      78,282,384,002
4   Zee TV                      97,302      59,037,594,757

非常感谢你!成功了。你知道为什么会这样吗?我一直在关注一个Youtube教程,即使他们使用writer.writerow([username.encode('utf-8')、uploads.encode('utf-8')、views.encode('utf-8')])也很管用。我不知道这个教程,所以有很多事情可能导致。。。但是无论如何,如果您在打开文件时定义了编码,并且没有对每个变量单独编码,而文件可能有不同的编码,那么可能会更清楚。很高兴提供帮助,欢迎使用Stack Overflow。如果此答案或任何其他答案解决了您的问题,请将其标记为已接受--谢谢
    Username                    Uploads     Views
1   T-Series                    15,029      143,032,749,708
2   Cocomelon - Nursery Rhymes  605         93,057,513,422
3   SET India                   48,505      78,282,384,002
4   Zee TV                      97,302      59,037,594,757