Python 来自卡夫卡的json无法转换为熊猫_Python_Json_Python 3.x_Pandas_Apache Kafka

Python 来自卡夫卡的json无法转换为熊猫

python json python-3.x pandas apache-kafka

Python 来自卡夫卡的json无法转换为熊猫,python,json,python-3.x,pandas,apache-kafka,Python,Json,Python 3.x,Pandas,Apache Kafka,嘿，我有这样的代码来使用卡夫卡数据 bootstrap_servers = ['localhost:9092'] topicName = 'testapp5' consumer = KafkaConsumer (topicName, group_id ='group1',bootstrap_servers = bootstrap_servers) for msg in consumer: print("Topic Name=%s,Message=%s"%

嘿，我有这样的代码来使用卡夫卡数据

 bootstrap_servers = ['localhost:9092']
 topicName = 'testapp5'
 consumer = KafkaConsumer (topicName, group_id ='group1',bootstrap_servers = bootstrap_servers)
 for msg in consumer:
       print("Topic Name=%s,Message=%s"%(msg.topic,msg.value))

然后我想用

  message = json.loads(msg.value)

输出：

 {'request_id': 'f84c55fd-c730-49ba-83b2-47b04643b706',
'data': {'age': 24,
'workclass': 'Self-emp-not-inc',
'fnlwgt': 188274,
'education': 'Bachelors',
'marital_status': 'Never-married',
'occupation': 'Sales',
'relationship': 'Not-in-family',
'race': 'White',
'gender': 'Male',
'capital_gain': 0,
'capital_loss': 0,
'hours_per_week': 50,
'native_country': 'United-States',
'income_bracket': '<=50K.'}}

以及输出：

 {'request_id': 'f84c55fd-c730-49ba-83b2-47b04643b706',
'data': {'age': 24,
'workclass': 'Self-emp-not-inc',
'fnlwgt': 188274,
'education': 'Bachelors',
'marital_status': 'Never-married',
'occupation': 'Sales',
'relationship': 'Not-in-family',
'race': 'White',
'gender': 'Male',
'capital_gain': 0,
'capital_loss': 0,
'hours_per_week': 50,
'native_country': 'United-States',
'income_bracket': '<=50K.'}}

我应该怎么做才能使来自kafka的json可以通过pandas dataframe访问？

谢谢你，这个最简单的方法就是使用。如果您只需要数据，可以使用dict键使用

pd.DataFrame

js = {'request_id': 'f84c55fd-c730-49ba-83b2-47b04643b706',
'data': {'age': 24,
'workclass': 'Self-emp-not-inc',
'fnlwgt': 188274,
'education': 'Bachelors',
'marital_status': 'Never-married',
'occupation': 'Sales',
'relationship': 'Not-in-family',
'race': 'White',
'gender': 'Male',
'capital_gain': 0,
'capital_loss': 0,
'hours_per_week': 50,
'native_country': 'United-States',
'income_bracket': '<=50K.'}} 
# simplest....
pd.json_normalize(js)
# if requestid is not needed
pd.DataFrame(js["data"], index=[0])

js={'request_id'：'f84c55fd-c730-49ba-83b2-47b04643b706'，
'data'：{'age'：24岁，
“工人阶级”：“自我emp非公司”，
“fnlwgt”：188274，
“教育”：单身汉，
“婚姻状况”：“从未结婚”，
‘职业’：‘销售’，
“关系”：“不在家里”，
“种族”：“白人”，
‘性别’：‘男性’，
“资本收益”：0，
“资本损失”：0，
“每周小时数”：50，
“原籍国”：“美国”，
“收入括号”：“使用pd.json\u规范化（消息）
您的数据帧只有一行，那么为什么要使用pandas？应该使用什么建议？@onecricketeer您只需要json。正如您在问题中所述，为单个消息加载
，您可以将其作为值\u反序列化器
传递给消费者。我不清楚您为什么要使用pandas，而您没有批处理服务如果您确实需要记录批处理和数据帧，那么pyspark w/structured streaming可以实现这一点，否则您需要定期将记录累积到列表中，然后将该列表刷新到数据帧中