Docker 是什么导致带有GCS接收器的flume抛出OutOfMemoryException_Docker_Google Cloud Storage_Flume Ng_Google Hadoop

Docker 是什么导致带有GCS接收器的flume抛出OutOfMemoryException

docker google-cloud-storage

Docker 是什么导致带有GCS接收器的flume抛出OutOfMemoryException,docker,google-cloud-storage,flume-ng,google-hadoop,Docker,Google Cloud Storage,Flume Ng,Google Hadoop,我正在使用flume写入Google云存储。Flume在HTTP:9000上侦听。我花了一些时间使它工作（添加gcs库，使用凭证文件…），但现在它似乎通过网络进行通信我正在为我的测试发送非常小的HTTP请求，我有很多可用的RAM： curl -X POST -d '[{ "headers" : { timestamp=1417444588182, env=dev, tenant=myTenant, type=myType }, "body" : "some body ONE" }]' loc

我正在使用flume写入Google云存储。Flume在HTTP:9000上侦听

。我花了一些时间使它工作（添加gcs库，使用凭证文件…），但现在它似乎通过网络进行通信
我正在为我的测试发送非常小的HTTP请求，我有很多可用的RAM：
curl -X POST -d '[{ "headers" : { timestamp=1417444588182, env=dev, tenant=myTenant, type=myType }, "body" : "some body ONE"  }]' localhost:9000

我在第一次请求时遇到此内存异常（当然，它会停止工作）：
我配置flume+GCS的方式有问题吗？还是有缺陷
我应该在哪里检查以收集更多数据
ps：我在docker内部运行flume ng

我的flume.conf
文件：
# Name the components on this agent
a1.sources = http
a1.sinks = hdfs_sink
a1.channels = mem

# Describe/configure the source
a1.sources.http.type =  org.apache.flume.source.http.HTTPSource
a1.sources.http.port = 9000

# Describe the sink
a1.sinks.hdfs_sink.type = hdfs
a1.sinks.hdfs_sink.hdfs.path = gs://my_bucket/%{env}/%{tenant}/%{type}/%Y-%m-%d
a1.sinks.hdfs_sink.hdfs.filePrefix = %H-%M-%S
a1.sinks.hdfs_sink.hdfs.fileSuffix = .json
a1.sinks.hdfs_sink.hdfs.round = true
a1.sinks.hdfs_sink.hdfs.roundValue = 10
a1.sinks.hdfs_sink.hdfs.roundUnit = minute

# Use a channel which buffers events in memory
a1.channels.mem.type = memory
a1.channels.mem.capacity = 10000
a1.channels.mem.transactionCapacity = 1000

# Bind the source and sink to the channel
a1.sources.http.channels = mem
a1.sinks.hdfs_sink.channel = mem


在我的flume/gcs之旅中的相关问题：
上载文件时，gcs Hadoop文件系统实现为每个FSDataOutputStream（文件打开以供写入）留出了相当大（64MB）的写入缓冲区。这可以通过在core-site.xml中设置为较小的值（以字节为单位）来更改。我想1MB就足以满足低容量日志收集的需要
此外，检查启动flume的JVM时最大堆大小设置为多少。flume ng脚本将默认JAVA_OPTS值设置为-Xmx20m，以将堆限制为20MB。这可以在flume-env.sh中设置为更大的值（有关详细信息，请参阅flume tarball发行版中的conf/flume-env.sh.template）。
上载文件时，GCS Hadoop文件系统实现会为每个FSDataOutputStream（文件打开以进行写入）留出相当大（64MB）的写入缓冲区。这可以通过在core-site.xml中设置为较小的值（以字节为单位）来更改。我想1MB就足以满足低容量日志收集的需要
此外，检查启动flume的JVM时最大堆大小设置为多少。flume ng脚本将默认JAVA_OPTS值设置为-Xmx20m，以将堆限制为20MB。这可以在flume-env.sh中设置为更大的值（有关详细信息，请参阅flume tarball发行版中的conf/flume-env.sh.template）。
上载文件时，GCS Hadoop文件系统实现会为每个FSDataOutputStream（文件打开以进行写入）留出相当大（64MB）的写入缓冲区。这可以通过在core-site.xml中设置为较小的值（以字节为单位）来更改。我想1MB就足以满足低容量日志收集的需要
此外，检查启动flume的JVM时最大堆大小设置为多少。flume ng脚本将默认JAVA_OPTS值设置为-Xmx20m，以将堆限制为20MB。这可以在flume-env.sh中设置为更大的值（有关详细信息，请参阅flume tarball发行版中的conf/flume-env.sh.template）。
上载文件时，GCS Hadoop文件系统实现会为每个FSDataOutputStream（文件打开以进行写入）留出相当大（64MB）的写入缓冲区。这可以通过在core-site.xml中设置为较小的值（以字节为单位）来更改。我想1MB就足以满足低容量日志收集的需要
此外，检查启动flume的JVM时最大堆大小设置为多少。flume ng脚本将默认JAVA_OPTS值设置为-Xmx20m，以将堆限制为20MB。这可以在flume-env.sh中设置为更大的值（有关详细信息，请参阅flume tarball发行版中的conf/flume-env.sh.template）。太糟糕了，我们无法将JVM大小传递给flume ng太糟糕了，我们无法将JVM大小传递给flume ng太糟糕了，我们无法将JVM大小传递给flume ng
gs://my_bucket/dev/myTenant/myType/2014-12-01/14-36-28.1417445234193.json.tmp

# Name the components on this agent
a1.sources = http
a1.sinks = hdfs_sink
a1.channels = mem

# Describe/configure the source
a1.sources.http.type =  org.apache.flume.source.http.HTTPSource
a1.sources.http.port = 9000

# Describe the sink
a1.sinks.hdfs_sink.type = hdfs
a1.sinks.hdfs_sink.hdfs.path = gs://my_bucket/%{env}/%{tenant}/%{type}/%Y-%m-%d
a1.sinks.hdfs_sink.hdfs.filePrefix = %H-%M-%S
a1.sinks.hdfs_sink.hdfs.fileSuffix = .json
a1.sinks.hdfs_sink.hdfs.round = true
a1.sinks.hdfs_sink.hdfs.roundValue = 10
a1.sinks.hdfs_sink.hdfs.roundUnit = minute

# Use a channel which buffers events in memory
a1.channels.mem.type = memory
a1.channels.mem.capacity = 10000
a1.channels.mem.transactionCapacity = 1000

# Bind the source and sink to the channel
a1.sources.http.channels = mem
a1.sinks.hdfs_sink.channel = mem