Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Kinesis Firehose在S3中放置JSON对象时不使用分隔符逗号_Json_Amazon Web Services_Aws Api Gateway_Amazon Kinesis_Amazon Kinesis Firehose - Fatal编程技术网

Kinesis Firehose在S3中放置JSON对象时不使用分隔符逗号

Kinesis Firehose在S3中放置JSON对象时不使用分隔符逗号,json,amazon-web-services,aws-api-gateway,amazon-kinesis,amazon-kinesis-firehose,Json,Amazon Web Services,Aws Api Gateway,Amazon Kinesis,Amazon Kinesis Firehose,在发送数据之前,我使用JSON.stringify对数据进行处理,如下所示 {"data": [{"key1": value1, "key2": value2}, {"key1": value1, "key2": value2}]} { "key1": value1, "key2": value2 }{ "key1": value1, "key2": value2 } 但一旦它通过AWS API网关,Kinesis Fire

在发送数据之前,我使用JSON.stringify对数据进行处理,如下所示

{"data": [{"key1": value1, "key2": value2}, {"key1": value1, "key2": value2}]}
    {
     "key1": value1, 
     "key2": value2
    }{
     "key1": value1, 
     "key2": value2
    }
但一旦它通过AWS API网关,Kinesis Firehose将其置于S3,它看起来是这样的

{"data": [{"key1": value1, "key2": value2}, {"key1": value1, "key2": value2}]}
    {
     "key1": value1, 
     "key2": value2
    }{
     "key1": value1, 
     "key2": value2
    }
JSON对象之间的分隔符逗号消失了,但我需要它来正确处理数据

API网关中的模板:

#set($root = $input.path('$'))
{
    "DeliveryStreamName": "some-delivery-stream",
    "Records": [
#foreach($r in $root.data)
#set($data = "{
    ""key1"": ""$r.value1"",
    ""key2"": ""$r.value2""
}")
    {
        "Data": "$util.base64Encode($data)"
    }#if($foreach.hasNext),#end
#end
    ]
}

我最近也遇到了同样的问题,我能找到的唯一答案基本上就是在每次将JSON消息发布到Kinesis流时,在每个JSON消息的末尾添加换行符(“\n”),或者使用某种原始JSON解码器方法,可以处理连在一起的JSON对象,而无需分隔符

我发布了一个python代码解决方案,可以在相关的堆栈溢出帖子中找到:

一旦AWS Firehose将JSON对象转储到s3,就完全可以从文件中读取各个JSON对象

使用Python可以使用
json
包中的
raw\u decode
函数

from json import JSONDecoder, JSONDecodeError
import re
import json
import boto3

NOT_WHITESPACE = re.compile(r'[^\s]')

def decode_stacked(document, pos=0, decoder=JSONDecoder()):
    while True:
        match = NOT_WHITESPACE.search(document, pos)
        if not match:
            return
        pos = match.start()

        try:
            obj, pos = decoder.raw_decode(document, pos)
        except JSONDecodeError:
            # do something sensible if there's some error
            raise
        yield obj

s3 = boto3.resource('s3')

obj = s3.Object("my-bukcet", "my-firehose-json-key.json")
file_content = obj.get()['Body'].read()
for obj in decode_stacked(file_content):
    print(json.dumps(obj))
    #  { "key1":value1,"key2":value2}
    #  { "key1":value1,"key2":value2}
资料来源:

使用胶水/Pyspark您可以使用

import json

rdd = sc.textFile("s3a://my-bucket/my-firehose-file-containing-json-objects")
df = rdd.map(lambda x: json.loads(x)).toDF()
df.show()

来源:

< P>一种方法,你可以考虑通过为你的数据处理器添加一个lambda函数作为数据处理器来配置数据处理,这将在最后将数据传递到S3桶之前执行。
DeliveryStream:
...
类型:AWS::KinesisFirehose::DeliveryStream
特性:
DeliveryStreamType:DirectPut
扩展的S3目标配置:
...
巴克特伦:!GetAtt MyDeliveryBucket.Arn
处理配置:
已启用:true
处理器:
-参数:
-参数名称:LambdaArn
参数值:!GetAtt MyTransformDataLambdaFunction.Arn
类型:Lambda
...
在Lambda函数中,确保记录的JSON字符串中追加了
'\n'
,请参见Node.js中Lambda函数
myTransformData.ts

import {
  FirehoseTransformationEvent,
  FirehoseTransformationEventRecord,
  FirehoseTransformationHandler,
  FirehoseTransformationResult,
  FirehoseTransformationResultRecord,
} from 'aws-lambda';

const createDroppedRecord = (
  recordId: string
): FirehoseTransformationResultRecord => {
  return {
    recordId,
    result: 'Dropped',
    data: Buffer.from('').toString('base64'),
  };
};

const processData = (
  payloadStr: string,
  record: FirehoseTransformationEventRecord
) => {
  let jsonRecord;
  // ...
  // Process the orginal payload,
  // And create the record in JSON
  return jsonRecord;
};

const transformRecord = (
  record: FirehoseTransformationEventRecord
): FirehoseTransformationResultRecord => {
  try {
    const payloadStr = Buffer.from(record.data, 'base64').toString();
    const jsonRecord = processData(payloadStr, record);
    if (!jsonRecord) {
      console.error('Error creating json record');
      return createDroppedRecord(record.recordId);
    }
    return {
      recordId: record.recordId,
      result: 'Ok',
      // Ensure that '\n' is appended to the record's JSON string.
      data: Buffer.from(JSON.stringify(jsonRecord) + '\n').toString('base64'),
    };
  } catch (error) {
    console.error('Error processing record ${record.recordId}: ', error);
    return createDroppedRecord(record.recordId);
  }
};

const transformRecords = (
  event: FirehoseTransformationEvent
): FirehoseTransformationResult => {
  let records: FirehoseTransformationResultRecord[] = [];
  for (const record of event.records) {
    const transformed = transformRecord(record);
    records.push(transformed);
  }
  return { records };
};

export const handler: FirehoseTransformationHandler = async (
  event,
  _context
) => {
  const transformed = transformRecords(event);
  return transformed;
};
一旦新行分隔符就位,像Athena这样的AWS服务将能够正确地处理S3存储桶中的JSON记录数据,而不是其他数据