Cloudera'中的水槽收集器示例;s用户指南未按预期工作

Cloudera'中的水槽收集器示例;s用户指南未按预期工作,cloud,flume,garbage,Cloud,Flume,Garbage,UserGuide中显示如何设置收集器并写入收集器的位具有以下配置: host : console | agentSink("localhost",35853) ; collector : collectorSource(35853) | console ; 我将此更改为: dataSource : console | agentSink("localhost") ; dataCollector : collectorSource() | console ; 我将节点生成为: flume n

UserGuide中显示如何设置收集器并写入收集器的位具有以下配置:

host : console | agentSink("localhost",35853) ;
collector : collectorSource(35853) | console ;
我将此更改为:

dataSource : console | agentSink("localhost") ;
dataCollector : collectorSource() | console ;
我将节点生成为:

flume node_nowatch -n dataSource
flume node_nowatch -n dataCollector
我已经在两个系统上尝试过这一点:

  • Cloudera自己的演示虚拟机在VirtualBox中运行,内存为2GB。 它配有水槽0.9.4-cdh3u2

  • UbuntuLTS(Lucid),带有debian包和openJDK(不包括任何已安装的hadoop包),作为虚拟机运行在VirtualBox中,内存为2GB 按照这里的步骤

  • 以下是我所做的:

    水槽转储“collectorSource()”
    导致

    $ sudo netstat -anp | grep 35853
    tcp6       0      0 :::35853                :::*                    LISTEN      3520/java
    $ ps aux | grep java | grep 3520
    1000      3520  0.8  2.3 1050508 44676 pts/0   Sl+  15:38   0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console;
    
    $ sudo netstat -anp | grep 35853
    tcp6       0      0 :::35853                :::*                    LISTEN      3520/java       
    tcp6       0      0 127.0.0.1:44878         127.0.0.1:35853         ESTABLISHED 3593/java       
    tcp6       0      0 127.0.0.1:35853         127.0.0.1:44878         ESTABLISHED 3520/java 
    
    $ ps aux | grep java | grep 3593
    1000      3593  1.2  3.0 1130956 57644 pts/1   Sl+  15:41   0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource
    
    我的假设是:

    flume dump 'collectorSource()'
    
    与运行配置相同:

    dump : collectorSource() | console ;
    
    并以

    flume node -1 -n dump -c "dump: collectorSource() | console;" -s 
    
    dataSource:console | agentink(“localhost”)
    导致

    $ sudo netstat -anp | grep 35853
    tcp6       0      0 :::35853                :::*                    LISTEN      3520/java
    $ ps aux | grep java | grep 3520
    1000      3520  0.8  2.3 1050508 44676 pts/0   Sl+  15:38   0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console;
    
    $ sudo netstat -anp | grep 35853
    tcp6       0      0 :::35853                :::*                    LISTEN      3520/java       
    tcp6       0      0 127.0.0.1:44878         127.0.0.1:35853         ESTABLISHED 3593/java       
    tcp6       0      0 127.0.0.1:35853         127.0.0.1:44878         ESTABLISHED 3520/java 
    
    $ ps aux | grep java | grep 3593
    1000      3593  1.2  3.0 1130956 57644 pts/1   Sl+  15:41   0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource
    
    观察到的行为在两个虚拟机中都是完全相同的:

    数据源处取消结束此流程

    2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
    durability.NaiveFileWALManager: File lives in
    /tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034
    2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
    hdfs.SeqfileEventSink: constructed new seqfile event sink:
    file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO
    durability.NaiveFileWALManager: opening log file
    20111215-152748172-0500.1116926245855.00000034
    2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO
    endtoend.AckListener$Empty: Empty Ack Listener began
    20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO
    agent.WALAckManager: Ack for
    20111215-152748172-0500.1116926245855.00000034 is queued to be checked
    2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO
    durability.WALSource: end of file NaiveFileWALManager
    (dir=/tmp/flume-cloudera/agent/dataSource )
    2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager:
    Retransmitting 20111215-152657736-0500.1066489868855.00000034 after
    being stale for 60048ms
    2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO
    durability.NaiveFileWALManager: opening log file
    20111215-152657736-0500.1066489868855.00000034
    2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
    agent.WALAckManager: Ack for
    20111215-152657736-0500.1066489868855.00000034 is queued to be checked
    2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
    durability.WALSource: end of file NaiveFileWALManager
    (dir=/tmp/flume-cloudera/agent/dataSource )
    2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
    hdfs.SeqfileEventSink: closed
    /tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
    endtoend.AckListener$Empty: Empty Ack Listener ended
    20111215-152758253-0500.1127006668855.00000034
    
    2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
    durability.NaiveFileWALManager: File lives in
    /tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
    hdfs.SeqfileEventSink: constructed new seqfile event sink:
    file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
    2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO
    durability.NaiveFileWALManager: opening log file
    20111215-152758253-0500.1127006668855.00000034
    2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO
    endtoend.AckListener$Empty: Empty Ack Listener began
    20111215-152808335-0500.1137089135855.00000034
    2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
    agent.WALAckManager: Ack for
    20111215-152758253-0500.1127006668855.00000034 is queued to be checked
    2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
    durability.WALSource: end of file NaiveFileWALManager
    (dir=/tmp/flume-cloudera/agent/dataSource )
    2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
    hdfs.SeqfileEventSink: closed
    /tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
    2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
    endtoend.AckListener$Empty: Empty Ack Listener ended
    20111215-152808335-0500.1137089135855.00000034
    
    ..
    
    2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager:
    Retransmitting 20111215-152707823-0500.1076576334855.00000034 after
    being stale for 60277ms
    2011-12-15 15:35:24,763 [Heartbeat] INFO
    durability.NaiveFileWALManager: Attempt to retry chunk
    '20111215-152707823-0500.1076576334855.00000034'  in LOGGED state.
    There is no need for state transition.
    
    在数据采集器处取消结束此数据流:

    localhost [INFO Thu Dec 15 15:31:09 EST 2011] {
    AckChecksum : (long)1323981059821  (string) ' 4Ck��' (double)6.54133557402E-312 } { AckTag : 20111215-153059819-0500.1308572847855.00000034 } { AckType : end }
    

    如何使通过收集器的控制台通信再次正常工作?

    我不太确定您的预期行为

    但看起来您可能只绑定到IPv6接口。我知道在未来,你必须解决这个问题:

    # Ubuntu wants us to use IPv6. Hadoop doesn't support that, but nevertheless binds to :::50010. Let's tell it we don't agree.
    export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
    
    您可能需要一个类似的选项。首先,为什么不显式地设置主机名和端口号,然后依次退出呢?

    • 转到/usr/lib/flume/bin
    • 将名为:flume-env.sh.template的文件重命名为:flume-env.sh
    • 在文件末尾添加此行:
      export-UOPTS=-Djava.net.preferIPv4Stack=true
    • 重新启动flume实例
    =>您将只侦听IP v4地址