Google bigquery 从Google应用程序引擎应用程序运行Google数据流管道?
我正在使用DataflowPipelineRunner创建数据流作业。我尝试了以下场景Google bigquery 从Google应用程序引擎应用程序运行Google数据流管道?,google-bigquery,google-cloud-platform,google-cloud-dataflow,Google Bigquery,Google Cloud Platform,Google Cloud Dataflow,我正在使用DataflowPipelineRunner创建数据流作业。我尝试了以下场景 不指定任何machineType 使用g1小型机器 带n1-highmem-2 在上述所有场景中,输入是来自GCS的文件,该文件非常小(KB大小),输出是大查询表 我在所有场景中都出现了内存不足错误 我编译的代码的大小是94mb。我只尝试单词计数示例,它没有读取任何输入(在作业开始之前失败)。请帮助我理解为什么会出现此错误 注意:我正在使用appengine启动作业 注意:相同的代码适用于beta verso
com.google.api.client.http.HttpRequest execute: exception thrown while executing request
com.google.appengine.api.urlfetch.RequestPayloadTooLargeException: The request to https://www.googleapis.com/upload/storage/v1/b/pwccloudedw-stagging-bucket/o?name=appengine-api-L4wtoWwoElWmstI1Ia93cg.jar&uploadType=resumable&upload_id=AEnB2Uo6HCfw6Usa3aXlcOzg0g3RawrvuAxWuOUtQxwQdxoyA0cf22LKqno0Gu-hjKGLqXIo8MF2FHR63zTxrSmQ9Yk9HdCdZQ exceeded the 10 MiB limit.
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.convertApplicationException(URLFetchServiceImpl.java:157)
at com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:45)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.fetchResponse(URLFetchServiceStreamHandler.java:543)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getInputStream(URLFetchServiceStreamHandler.java:422)
at com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getResponseCode(URLFetchServiceStreamHandler.java:275)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:260)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1168)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:605)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1$1.run(ApiProxyImpl.java:1152)
at java.security.AccessController.doPrivileged(Native Method)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1.run(ApiProxyImpl.java:1146)
at java.lang.Thread.run(Thread.java:745)
at com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$2$1.run(ApiProxyImpl.java:1195)
com.google.api.client.http.HttpRequest execute:执行请求时引发异常
com.google.appengine.api.urlfetch.RequestPayloadTooLargeException:请求https://www.googleapis.com/upload/storage/v1/b/pwccloudedw-stagging-bucket/o?name=appengine-api-L4wtowowowelwmsti1ia93cg.jar和上传类型=可恢复和上传id=AENB2UO6HCFW6USA3AXLCOZG0G3RAWRVAUAXWOUTPQWQWQDXOYA0CF22LQNO0GU-HJKGLQXIO8MF2HR63ZTXRSMQ9YK9HDCDZQ超过10 MiB限制。
位于com.google.appengine.api.urlfetch.URLFetchServiceImpl.convertApplicationException(URLFetchServiceImpl.java:157)
位于com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:45)
位于com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.fetchResponse(URLFetchServiceStreamHandler.java:543)
位于com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getInputStream(URLFetchServiceStreamHandler.java:422)
位于com.google.apphosting.utils.security.urlfetch.URLFetchServiceStreamHandler$Connection.getResponseCode(URLFetchServiceStreamHandler.java:275)
位于com.google.api.client.http.javanet.NetHttpResponse(NetHttpResponse.java:36)
位于com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
位于com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
在com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
在com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequest(MediaHttpUploader.java:562)
在com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:419)上
在com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)上
位于com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
位于com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
位于com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
位于com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
在java.util.concurrent.FutureTask.run(FutureTask.java:260)中
位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1168)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:605)
位于com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1$1.run(ApiProxyImpl.java:1152)
位于java.security.AccessController.doPrivileged(本机方法)
位于com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$1.run(ApiProxyImpl.java:1146)
运行(Thread.java:745)
位于com.google.apphosting.runtime.ApiProxyImpl$CurrentRequestThreadFactory$2$1.run(ApiProxyImpl.java:1195)
我尝试直接上传jar文件-appengine-api-1.0-sdk-1.9.20.jar,但它仍然尝试上传这个jarappengine-api-l4wtowowelwmsti1ia93cg.jar。
我不知道它是什么罐子。你知道这个罐子是什么吗
请帮助我解决此问题。简短的回答是,如果在上使用AppEngine,您将不会遇到AppEngine沙箱限制(使用时OOM、执行时间限制问题、白名单JRE类)。如果您真的想在AppEngine沙箱中运行,那么您对dataflowsdk的使用最符合AppEngine沙箱的限制。下面,我将解释常见问题以及人们为遵守AppEngine沙箱限制所做的工作 Dataflow SDK需要一个AppEngine实例类,该类有足够的内存来执行users应用程序,以构建管道、暂存任何资源并将作业描述发送给Dataflow服务。通常,我们已经看到用户需要使用超过128mb内存的来避免OOM错误 通常,如果应用程序所需的资源已经暂存,则构建管道并将其提交到数据流服务通常需要不到几秒钟的时间。将JAR和任何其他资源上载到GCS可能需要超过60秒。这可以通过提前将JAR预暂存到GCS(如果Dataflow SDK检测到JAR已经存在,它将跳过再次暂存)或使用获取10分钟限制(注意,对于大型应用程序,10分钟可能不足以暂存所有资源)来手动解决 最后,在AppEngine沙盒环境中,您和您的所有依赖项仅限于在JRE中使用类,否则会出现如下异常:
java.lang.SecurityException:
java.lang.IllegalAccessException: YYY is not allowed on ZZZ
...
编辑1