Google cloud platform Avro到BigTable-架构问题?

Google cloud platform Avro到BigTable-架构问题?,google-cloud-platform,apache-beam,gcloud,avro,google-cloud-bigtable,Google Cloud Platform,Apache Beam,Gcloud,Avro,Google Cloud Bigtable,我试图使用数据流模板[1]将一个Avro文件(由Spark 3.0生成)摄取到BigTable中,并得出以下错误 注意:该文件可以在Spark和Pythonavro库中读取,没有明显问题 有什么想法吗 谢谢你的支持 错误(短) Avro模式(摘录) {“类型”:“记录”,“名称”:“topLevelRecord”,“字段”:[{“名称”:“a_a”,“类型”:[“字符串”,“空”]},…]} 错误(完整) 参考资料: [1] BigTable是一种可扩展的NoSQL数据库服务,这意味着它是无模式

我试图使用数据流模板[1]将一个Avro文件(由Spark 3.0生成)摄取到BigTable中,并得出以下错误

注意:该文件可以在Spark和Python
avro
库中读取,没有明显问题

有什么想法吗

谢谢你的支持

错误(短)

Avro模式(摘录)

{“类型”:“记录”,“名称”:“topLevelRecord”,“字段”:[{“名称”:“a_a”,“类型”:[“字符串”,“空”]},…]}

错误(完整)

参考资料:


[1]

BigTable是一种可扩展的NoSQL数据库服务,这意味着它是无模式的;而Spark SQL有一个模式,正如您在问题中所指出的

从下面的错误来看,它指的是BigTable

因此,您需要按照以下步骤创建BigTable模式设计

由于HBase也是无模式的,如果您能够灵活地使用Spark 2.4.0


对于上面的用例,它看起来是一个有效的特性请求,我将向产品团队提交该请求,并向您更新报告编号。

谢谢@Ismail。因此,您建议预先定义一个映射Avro模式的BigTable模式,或者使用前面提到的连接器将Spark直接写入BigTable?@py-r我将从ConnectorTanks开始。尝试我的机会,但面对
Caused by: org.apache.avro.AvroTypeException: Found topLevelRecord, expecting com.google.cloud.teleport.bigtable.BigtableRow, missing required field key
java.io.IOException: Failed to start reading from source: gs://myfolder/myfile.avro range [0, 15197631)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start (WorkerCustomSources.java:610)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start (ReadOperation.java:361)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop (ReadOperation.java:194)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start (ReadOperation.java:159)
at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute (MapTaskExecutor.java:77)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork (BatchDataflowWorker.java:417)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork (BatchDataflowWorker.java:386)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork (BatchDataflowWorker.java:311)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork (DataflowBatchWorkerHarness.java:140)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call (DataflowBatchWorkerHarness.java:120)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call (DataflowBatchWorkerHarness.java:107)
at java.util.concurrent.FutureTask.run (FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:628)
at java.lang.Thread.run (Thread.java:834)
Caused by: org.apache.avro.AvroTypeException: Found topLevelRecord, expecting com.google.cloud.teleport.bigtable.BigtableRow, missing required field key
at org.apache.avro.io.ResolvingDecoder.doAction (ResolvingDecoder.java:292)
at org.apache.avro.io.parsing.Parser.advance (Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readFieldOrder (ResolvingDecoder.java:130)
at org.apache.avro.generic.GenericDatumReader.readRecord (GenericDatumReader.java:215)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion (GenericDatumReader.java:175)
at org.apache.avro.generic.GenericDatumReader.read (GenericDatumReader.java:153)
at org.apache.avro.generic.GenericDatumReader.read (GenericDatumReader.java:145)
at org.apache.beam.sdk.io.AvroSource$AvroBlock.readNextRecord (AvroSource.java:644)
at org.apache.beam.sdk.io.BlockBasedSource$BlockBasedReader.readNextRecord (BlockBasedSource.java:210)
at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.advanceImpl (FileBasedSource.java:484)
at org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl (FileBasedSource.java:479)
at org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start (OffsetBasedSource.java:249)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start (WorkerCustomSources.java:607)
expecting com.google.cloud.teleport.bigtable.BigtableRow, missing required field key