Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中聚合JSON值_Python_Sql_Json_Database_File - Fatal编程技术网

在Python中聚合JSON值

在Python中聚合JSON值,python,sql,json,database,file,Python,Sql,Json,Database,File,我想知道是否有可能在python中将JSON数据聚合成新值 例如,单个JSON值如下所示: {"time": {"Friday": {"20:00": 2, "19:00": 1, "22:00": 10, "21:00": 5, "23:00": 14, "0:00": 2, "18:00": 2}, "Thursday": {"23:00": 1, "0:00": 1, "19:00": 1, "18:00": 1, "16:00": 2, "22

我想知道是否有可能在python中将JSON数据聚合成新值

例如,单个JSON值如下所示:

{"time": {"Friday": {"20:00": 2, "19:00": 1, "22:00": 10, "21:00": 5, 
          "23:00": 14, "0:00": 2, "18:00": 2}, "Thursday": {"23:00": 1, 
          "0:00": 1, "19:00": 1, "18:00": 1, "16:00": 2, "22:00": 2},
          "Wednesday": {"17:00": 2, "23:00": 3, "16:00": 1, "22:00": 1, 
          "19:00": 1, "21:00": 1}, "Sunday": {"16:00": 2, "17:00": 2, "19:00": 1, 
          "22:00": 4, "21:00": 4, "0:00": 3, "1:00": 2}, "Saturday": 
          {"21:00": 4, "20:00": 3, "23:00": 10, "22:00": 7, "18:00": 
          1, "15:00": 2, "16:00": 1, "17:00": 1, "0:00": 8, "1:00": 
          1}, "Tuesday": {"19:00": 1, "17:00": 1, "1:00": 2, "21:00": 
          1, "23:00": 3}, "Monday": {"18:00": 2, "23:00": 1, "22:00": 2}}
我想根据开放时间将其分为四类

这四类是:

上午6时至中午12时:上午

中午12时至下午5时:下午

下午五时至十一时:晚上

晚上11时至上午6时:晚上

例如:

如果这是当前值:

“Friday”:{“20:00”: 5,“21:00”: 10}
那么输出应该是:

"Friday": {"morning": 0, "afternoon": 0, "evening": 15, "night": 0}
因此,输出应为

"Day": {"morning": count, "afternoon": count, "evening": count, "night":count}
对于所有数百个JSON值

我的想法是,我可以创建4个代表每个时区的箱子。然后我将使用两个for循环遍历每个days值。如果该值在桶的范围内,我会将其添加到计数中。然后,我会将日期存储在一个字典中,其值也是一个字典。内部字典将由四个时区组成,其值为count。然后,我会返回这一天,并重新开始每一天

到目前为止,我还需要实现聚合函数

import json
from datetime import datetime

def cleanStr4SQL(s):
    return s.replace("'","`").replace("\n"," ")

def parseCheckinData():
    #write code to parse yelp_checkin.JSON
    with open('yelp_checkin.JSON') as f:
        outfile = open('checkin.txt', 'w')
        line = f.readline()
        count_line = 0
        while line:
            data = json.loads(line)
            outfile.write(cleanStr4SQL(str(data['business_id'])) + '\t')
            outfile.write(aggregate(cleanStr4SQL(str(data['time']))))

            line = f.readline()
            count_line+=1
    print(count_line)
    outfile.close()
    f.close()

def aggregate(line):
    morning = []
    afternoon = []
    evening = []
    night = []
    for l in line:
        print(l)
我想知道用python解决这个问题的最佳方法是什么

任何建议都将不胜感激。我知道没有代码,但如果有人能给我指出一个方向,那就太好了


感谢您阅读

这里有一种可能的方法。我只用了一个json字符串就尝试了它,所以您可能需要扩展它来处理多个事件

import json
import pandas as pd

jsontxt = '{"time": {"Friday": {"20:00": 2, "19:00": 1, "22:00": 10, "21:00": 5, "23:00": 14, "0:00": 2, "18:00": 2}, "Thursday": {"23:00": 1, "0:00": 1, "19:00": 1, "18:00": 1, "16:00": 2, "22:00": 2}, "Wednesday": {"17:00": 2, "23:00": 3, "16:00": 1, "22:00": 1, "19:00": 1, "21:00": 1}, "Sunday": {"16:00": 2, "17:00": 2, "19:00": 1, "22:00": 4, "21:00": 4, "0:00": 3, "1:00": 2}, "Saturday": {"21:00": 4, "20:00": 3, "23:00": 10, "22:00": 7, "18:00": 1, "15:00": 2, "16:00": 1, "17:00": 1, "0:00": 8, "1:00": 1}, "Tuesday": {"19:00": 1, "17:00": 1, "1:00": 2, "21:00": 1, "23:00": 3}, "Monday": {"18:00": 2, "23:00": 1, "22:00": 2}}}'

# Parse the json and convert to a dictionary object
jsondict = json.loads(jsontxt)

# Convert the "time" element in the dictionary to a pandas DataFrame
df = pd.DataFrame(jsondict['time'])

# Define a function to convert the time slots to the categories
def cat(time_slot):
    if '06:00' <= time_slot < '12:00':
        return 'Morning'
    elif '12:00' <= time_slot < '17:00':
        return 'Afternoon'
    elif '17:00' <= time_slot < '23:00':
        return 'Evening'
    else:
        return 'Night'

# Add a new column "Time" to the DataFrame and set the values after left padding the values in the index
df['Time'] = df.index.str.rjust(5,'0')

# Add a new column "Category" and the set the values based on the time slot
df['Category'] = df['Time'].apply(cat)

# Create a pivot table based on the "Category" column
pt = df.pivot_table(index='Category', aggfunc=sum, fill_value=0)

# Convert the pivot table to a dictionary to get the json output you want
jsonoutput = pt.to_dict()
print(jsonoutput)
导入json
作为pd进口熊猫
jsontxt={“时间”:{“星期五”:{“20:00”:2,“19:00”:1,“22:00”:10,“21:00”:5,“23:00”:14,“0:00”:2,“18:00”:2},“星期四”:{“23:00”:1,“0:00”:1,“19:00”:1,“18:00”:1,“16:00”:2,“22:00”:2,“星期三”:{“17:00”:2,“23:00”:3,“16:00”:1,“22:00”:1,“21:00”:1,“19:00”:1,“星期日”,“16:00”:2,“17:00”:1,“17:00:00”,17:00,“21:00”:4,“0:00”:3,“1:00”:2},“星期六”:{“21:00”:4,“20:00”:3,“23:00”:10,“22:00”:7,“18:00”:1,“15:00”:2,“16:00”:1,“17:00”:1,“0:00”:8,“1:00”:1},“星期二”:{“19:00”:1,“17:00”:1,“1:00”:2,“21:00”:1,“23:00”:3},“星期一”:{“18:00”:2,“23:00”:1,“22:00”:2}
#解析json并转换为dictionary对象
jsondict=json.load(jsontxt)
#将字典中的“time”元素转换为数据帧
df=pd.DataFrame(jsondict['time'])
#定义一个函数以将时隙转换为类别
def cat(时隙):

如果“06:00”,您正在处理一种时间序列类型的数据。请尝试寻找一个特定的包来处理该数据。例如,mongodb具有处理数据聚合的本机方法。有趣的是,我将研究它。我只是想知道,您知道迭代此JSON值的最佳方式是什么吗?谢谢!乍一看,我会迭代日复一日,创建新的类别(早上、晚上等)。惊人的精彩!我没有想到使用熊猫图书馆。谢谢你的帮助!谢谢你的感谢之词。很高兴能提供帮助。