使用Python向Flume Avro源发送消息
我想编写一个Python程序,将JSON文档流转换为Avro,并将它们流到Flume,以便将它们发送到Solr和Parquet 我正在看一个使用Python avro库的示例,该库声称实现了avro rpc协议 但当我尝试将示例发送到Flume Avro服务器时,它似乎只是关闭了连接。例如使用Python向Flume Avro源发送消息,python,cloudera,flume,avro,Python,Cloudera,Flume,Avro,我想编写一个Python程序,将JSON文档流转换为Avro,并将它们流到Flume,以便将它们发送到Solr和Parquet 我正在看一个使用Python avro库的示例,该库声称实现了avro rpc协议 但当我尝试将示例发送到Flume Avro服务器时,它似乎只是关闭了连接。例如 $ ./atest.py jnkjn kjnkjn e3e3 Have requester About to request... REQUEST>Úòs±3ô8RÍsÊT¿ÌQÚòs±3ô8RÍs
$ ./atest.py jnkjn kjnkjn e3e3
Have requester
About to request... REQUEST>Úòs±3ô8RÍsÊT¿ÌQÚòs±3ô8RÍsÊT¿Ìsend
jnkjn
kjnkje3e3<
RESPONSE><
Traceback (most recent call last):
File "atest.py", line 35, in <module>
print("Result: " + requestor.request('send', params))
...
File "/usr/lib64/python2.6/httplib.py", line 991, in getresponse
response.begin()
File "/usr/lib64/python2.6/httplib.py", line 392, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.6/httplib.py", line 356, in _read_status
raise BadStatusLine(line)
httplib.BadStatusLine
当我遇到这个问题时,是因为水槽通道的缓冲区大小不足。 调整通道的缓冲区大小以解决该问题
对于内存通道,属性是:字节容量缓冲百分比@MatthewMoisen是的,水槽就像变形线一样是一堆垃圾。最后,我编写了一个Python程序,直接与Solr JSON api对话,并将其转换为Avro,并通过HttpFS将其粘贴到HDFS上。然后,我通过一个外部表将它们吸入蜂巢元存储,在那里我可以在黑斑羚中处理它们,并使用黑斑羚选择/插入将它们转换为拼花地板。直接使用Solr JSON api的好处在于,当CDH包含更新版本的Solr时,它将允许提交嵌套文档,而morphlines永远不会这样做。
2014-09-16 16:35:15,745 INFO org.apache.avro.ipc.NettyServer: [id: 0x0633c6d1, /192.168.150.84:38516 => /172.31.1.204:19999] OPEN
2014-09-16 16:35:15,745 INFO org.apache.avro.ipc.NettyServer: [id: 0x0633c6d1, /192.168.150.84:38516 => /172.31.1.204:19999] BOUND: /172.31.1.204:19999
2014-09-16 16:35:15,746 INFO org.apache.avro.ipc.NettyServer: [id: 0x0633c6d1, /192.168.150.84:38516 => /172.31.1.204:19999] CONNECTED: /192.168.150.84:38516
2014-09-16 16:35:15,747 INFO org.apache.avro.ipc.NettyServer: [id: 0x0633c6d1, /192.168.150.84:38516 :> /172.31.1.204:19999] DISCONNECTED
2014-09-16 16:35:15,747 INFO org.apache.avro.ipc.NettyServer: [id: 0x0633c6d1, /192.168.150.84:38516 :> /172.31.1.204:19999] UNBOUND
2014-09-16 16:35:15,747 INFO org.apache.avro.ipc.NettyServer: [id: 0x0633c6d1, /192.168.150.84:38516 :> /172.31.1.204:19999] CLOSED
2014-09-16 16:35:15,747 INFO org.apache.avro.ipc.NettyServer: Connection to /192.168.150.84:38516 disconnected.
2014-09-16 16:35:15,747 WARN org.apache.avro.ipc.NettyServer: Unexpected exception from downstream.
org.apache.avro.AvroRuntimeException: Excessively large list allocation request detected: 539959368 items! Connection closed.
at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:167)
at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:139)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:422)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)