Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/blackberry/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Can';我不知道如何正确地输出我的数据_Python_Python 3.x - Fatal编程技术网

Python Can';我不知道如何正确地输出我的数据

Python Can';我不知道如何正确地输出我的数据,python,python-3.x,Python,Python 3.x,我是python方面的新手,但我还是设法为Instagram构建了一个刮板。现在,我想更进一步,将IG配置文件中最常用的5个hashtag输出到我的CSV输出文件中 电流输出: 我已经成功地分离出了5个最常用的hashtag,但我在csv中得到了这个结果: [(#奋力追求),3),(#jamesgang,3),(#来自阿克伦的孩子们, 2) ,(“我们可以一起改变世界”,1),(“万圣节纪事”,1)] 所需输出: 我希望最后在.CSV的末尾有5列输出第X个最常用的值 因此,在这句话中: 我在

我是python方面的新手,但我还是设法为Instagram构建了一个刮板。现在,我想更进一步,将IG配置文件中最常用的5个hashtag输出到我的CSV输出文件中

电流输出:

我已经成功地分离出了5个最常用的hashtag,但我在csv中得到了这个结果:

[(#奋力追求),3),(#jamesgang,3),(#来自阿克伦的孩子们, 2) ,(“我们可以一起改变世界”,1),(“万圣节纪事”,1)]

所需输出:

我希望最后在.CSV的末尾有5列输出第X个最常用的值

因此,在这句话中:

我在谷歌上搜索了一段时间,并设法将它们分开,但我总是以“(“#thekidfromakron',2)”作为输出。我似乎遗漏了谜题的一部分:(

以下是我目前的工作:

import csv
import requests
from bs4 import BeautifulSoup
import json
import re
import time
from collections import Counter
ts = time.gmtime()


def get_csv_header(top_numb):
        fieldnames = ['USER','MEDIA COUNT','FOLLOWERCOUNT','TOTAL LIKES','TOTAL COMMENTS','ER','ER IN %', 'BIO', 'ALL CAPTION TEXT','HASHTAGS COUNTED','MOST COMMON HASHTAGS']
        return fieldnames


def write_csv_header(filename, headers):
        with open(filename, 'w', newline='') as f_out:
            writer = csv.DictWriter(f_out, fieldnames=headers)
            writer.writeheader()
        return

def read_user_name(t_file):
        with open(t_file) as f:
            user_list = f.read().splitlines()
        return user_list
if __name__ == '__main__':

    # HERE YOU CAN SPECIFY YOUR USERLIST FILE NAME,
    # Which contains a list of usernames's BY DEFAULT <current working directory>/userlist.txt
    USER_FILE = 'userlist.txt'

    # HERE YOU CAN SPECIFY YOUR DATA FILE NAME, BY DEFAULT (data.csv)', Where your final result stays
    DATA_FILE = 'users_with_er.csv'
    MAX_POST = 12  # MAX POST

    print('Starting the engagement calculations... Please wait until it finishes!')


    users = read_user_name(USER_FILE)
    """ Writing data to csv file """
    csv_headers = get_csv_header(MAX_POST)
    write_csv_header(DATA_FILE, csv_headers)

    for user  in users:

        post_info = {'USER': user}
        url = 'https://www.instagram.com/' + user + '/'

        #for troubleshooting, un-comment the next two lines:
        #print(user)
        #print(url)

        try: 
            r = requests.get(url)
            if r.status_code != 200: 
                print(timestamp,' user {0} not found or page unavailable! Skipping...'.format(user))
                continue
            soup = BeautifulSoup(r.content, "html.parser")
            scripts = soup.find_all('script', type="text/javascript", text=re.compile('window._sharedData'))
            stringified_json = scripts[0].get_text().replace('window._sharedData = ', '')[:-1]

            j = json.loads(stringified_json)['entry_data']['ProfilePage'][0]
            timestamp = time.strftime("%d-%m-%Y %H:%M:%S", ts)
        except ValueError:
            print(timestamp,'ValueError for username {0}...Skipping...'.format(user))
            continue
        except IndexError as error:
        # Output expected IndexErrors.
            print(timestamp, error)
            continue
        if j['graphql']['user']['edge_followed_by']['count'] <=0:
            print(timestamp,'user {0} has no followers! Skipping...'.format(user))
            continue
        if j['graphql']['user']['edge_owner_to_timeline_media']['count'] <12:
            print(timestamp,'user {0} has less than 12 posts! Skipping...'.format(user))
            continue
        if j['graphql']['user']['is_private'] is True:
            print(timestamp,'user {0} has a private profile! Skipping...'.format(user))
            continue
        media_count = j['graphql']['user']['edge_owner_to_timeline_media']['count']
        accountname = j['graphql']['user']['username']
        followercount = j['graphql']['user']['edge_followed_by']['count']
        bio = j['graphql']['user']['biography']
        i = 0
        total_likes = 0
        total_comments = 0
        all_captiontext = ''
        while i <= 11: 
                total_likes += j['graphql']['user']['edge_owner_to_timeline_media']['edges'][i]['node']['edge_liked_by']['count']
                total_comments += j['graphql']['user']['edge_owner_to_timeline_media']['edges'][i]['node']['edge_media_to_comment']['count']
                captions = j['graphql']['user']['edge_owner_to_timeline_media']['edges'][i]['node']['edge_media_to_caption']
                caption_detail = captions['edges'][0]['node']['text']
                all_captiontext += caption_detail
                i += 1
        engagement_rate_percentage = '{0:.4f}'.format((((total_likes + total_comments) / followercount)/12)*100) + '%'
        engagement_rate = (((total_likes + total_comments) / followercount)/12*100)

        #isolate and count hashtags
        hashtags = re.findall(r'#\w*', all_captiontext)
        hashtags_counted = Counter(hashtags)
        most_common = hashtags_counted.most_common(5)

        with open('users_with_er.csv', 'a', newline='',  encoding='utf-8') as data_out:

            print(timestamp,'Writing Data for user {0}...'.format(user))            
            post_info["USER"] = accountname
            post_info["FOLLOWERCOUNT"] = followercount
            post_info["MEDIA COUNT"] = media_count
            post_info["TOTAL LIKES"] = total_likes
            post_info["TOTAL COMMENTS"] = total_comments
            post_info["ER"] = engagement_rate
            post_info["ER IN %"] = engagement_rate_percentage
            post_info["BIO"] = bio
            post_info["ALL CAPTION TEXT"] = all_captiontext
            post_info["HASHTAGS COUNTED"] = hashtags_counted
            csv_writer = csv.DictWriter(data_out, fieldnames=csv_headers)
            csv_writer.writerow(post_info)

""" Done with the script """
print('ALL DONE !!!! ')
导入csv
导入请求
从bs4导入BeautifulSoup
导入json
进口稀土
导入时间
从收款进口柜台
ts=time.gmtime()
def get_csv_标题(顶部编号):
FieldName=['USER'、'MEDIA COUNT'、'FOLLOWERCOUNT'、'TOTAL LIKES'、'TOTAL COMMENTS'、'ER'、'ER IN%'、'BIO'、'ALL CAPTION TEXT'、'HASHTAGS COUNTED'、'MOST COMMON HASHTAGS']
返回字段名
def write_csv_标题(文件名、标题):
将open(文件名为“w”,换行符为“”)作为f_out:
writer=csv.DictWriter(f_out,fieldnames=headers)
writer.writeheader()
回来
def read_用户名(t_文件):
打开(t_文件)作为f:
用户列表=f.read().splitlines()
返回用户列表
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu':
#您可以在此处指定您的用户列表文件名,
#其中包含默认情况下用户名的列表/userlist.txt
用户_文件='userlist.txt'
#在这里,您可以指定您的数据文件名,默认情况下(DATA.csv)”,最终结果将保留在该文件中
DATA_FILE='users_with_er.csv'
最大桩号=12#最大桩号
打印('正在开始订婚计算…请等到它完成!')
用户=读取用户名称(用户文件)
“”“正在将数据写入csv文件”“”
csv\u头=获取csv\u头(最多发布)
写入csv头(数据文件、csv头)
对于用户中的用户:
post_info={'USER':USER}
url='1〕https://www.instagram.com/“+用户+”/”
#对于故障排除,请取消对下面两行的注释:
#打印(用户)
#打印(url)
尝试:
r=请求。获取(url)
如果r.status_代码!=200:
打印(时间戳“未找到用户{0}或页面不可用!正在跳过…”。格式(用户))
持续
soup=BeautifulSoup(r.content,“html.parser”)
scripts=soup.find_all('script',type=“text/javascript”,text=re.compile('window.\u sharedData'))
stringified_json=脚本[0]。获取_text().replace('window.\u sharedData=','')[:-1]
j=json.loads(stringified_json)['entry_data']['ProfilePage'][0]
timestamp=time.strftime(“%d-%m-%Y%H:%m:%S”,ts)
除值错误外:
打印(时间戳,'ValueError for username{0}…跳过…'。格式(用户))
持续
除索引器错误外:
#输出预期的索引器。
打印(时间戳,错误)
持续

如果j['graphql']['user']['edge\u后跟']['count']most\u common是hashtags调用的输出。most\u common,我在这里查看了文档:

如果将以下内容格式化,则输出:
[(键,值),(键,值),…]
,并按发生次数的重要性递减排序

因此,要仅获取名称而不获取发生次数,应替换:

post_info["MOST COMMON HASHTAGS"] = most_common

您有一个元组列表。此语句动态构建每个元组的第一个元素的列表,保持排序顺序。

Replace line

post_info["MOST COMMON HASHTAGS"] = most_common
与:

还有一点代码缺失。例如,您的代码不包含
csv\u头
变量,我想这应该是

csv_headers = post_info.keys()

似乎你打开一个文件只写一行。我不认为这是故意的,所以你想做的是将结果收集到一个字典列表中。一个更干净的解决方案是使用pandas的数据框,你可以。

嘿!感谢你的反馈,你的解决方案让我接近了,但我得到了一些错误我对专栏的创建感到非常高兴。实际上我现在已经包含了我的完整代码,因为排除部分只会让我不清楚它们是如何协同工作的。代码本身是我在网上找到的其他现有代码的集合。所以我很确定其中有不必要或低效的部分:).是的,代码中有很多混乱之处,所以在你正确重构代码后,我相信你会修复错误的。我建议查阅一些软件设计指南,或者找一位导师来研究您的编码实践。:)有了你建议的代码和一些返工,使它适合我的混乱,我能够解决它!谢谢:D
for i, counter_tuple in enumerate(most_common):
  tag_name = counter_tuple[0].replace('#','')
  label = "Top %d" % (i + 1)
  post_info[label] = tag_name
csv_headers = post_info.keys()