Python 如何持续向卡夫卡发送数据？_Python_Python 2.7_Apache Kafka_Tshark_Pyshark

Python 如何持续向卡夫卡发送数据？

python python-2.7 apache-kafka

Python 如何持续向卡夫卡发送数据？,python,python-2.7,apache-kafka,tshark,pyshark,Python,Python 2.7,Apache Kafka,Tshark,Pyshark,我正在尝试向kafka代理/消费者连续发送数据（使用tshark嗅探数据包）以下是我遵循的步骤： 1.启动zookeeper: 2.启动卡夫卡服务器： 3.启动卡夫卡消费： from kafka import KafkaProducer import subprocess producer = KafkaProducer(bootstrap_servers='localhost:9092') producer.send('my-topic', subprocess.check_output([

我正在尝试向kafka代理/消费者连续发送数据（使用tshark嗅探数据包）

以下是我遵循的步骤：

1.启动zookeeper:

2.启动卡夫卡服务器：

3.启动卡夫卡消费：

from kafka import KafkaProducer
import subprocess
producer = KafkaProducer(bootstrap_servers='localhost:9092')
producer.send('my-topic', subprocess.check_output(['tshark','-i','wlan0']))

4.编写了以下python脚本，将嗅探到的数据发送给消费者：

from kafka import KafkaProducer
import subprocess
producer = KafkaProducer(bootstrap_servers='localhost:9092')
producer.send('my-topic', subprocess.check_output(['tshark','-i','wlan0']))

但这是停留在procuder终端和输出：

Capturing on 'wlan0'
605
^C

任何东西都不会传递给消费者

我知道我可以使用

pyshark

在python上实现tshark：

import pyshark
capture = pyshark.LiveCapture(interface='eth0')
capture.sniff(timeout=5)
capture1=capture[0]
print capture1

但我不知道如何将捕获的数据包从生产者持续发送给消费者。有什么建议吗

谢谢大家!

检查以下链接

实现卡夫卡制作人这里定义了用于测试集群的Kafka producer代码的主要部分。在main类中，我们设置数据管道和线程：

LOGGER.debug("Setting up streams");
PipedInputStream send = new PipedInputStream(BUFFER_LEN);
PipedOutputStream input = new PipedOutputStream(send);

LOGGER.debug("Setting up connections");
LOGGER.debug("Setting up file reader");
BufferedFileReader reader = new BufferedFileReader(filename, input);
LOGGER.debug("Setting up kafka producer");
KafkaProducer kafkaProducer = new KafkaProducer(topic, send);

LOGGER.debug("Spinning up threads");
Thread source = new Thread(reader);
Thread kafka = new Thread(kafkaProducer);

source.start();
kafka.start();

LOGGER.debug("Joining");
kafka.join();
The BufferedFileReader in its own thread reads off the data from disk:
rd = new BufferedReader(new FileReader(this.fileToRead));
wd = new BufferedWriter(new OutputStreamWriter(this.outputStream, ENC));
int b = -1;
while ((b = rd.read()) != -1)
{
    wd.write(b);
}
Finally, the KafkaProducer sends asynchronous messages to the Kafka Cluster:
rd = new BufferedReader(new InputStreamReader(this.inputStream, ENC));
String line = null;
producer = new Producer<Integer, String>(conf);
while ((line = rd.readLine()) != null)
{
    producer.send(new KeyedMessage<Integer, String>(this.topic, line));
}
Doing these operations on separate threads gives us the benefit of having disk reads not block the Kafka producer or vice-versa, enabling maximum performance tunable by the size of the buffer.
Implementing the Storm Topology
Topology Definition
Moving onward to Storm, here we define the topology and how each bolt will be talking to each other:
TopologyBuilder topology = new TopologyBuilder();

topology.setSpout("kafka_spout", new KafkaSpout(kafkaConf), 4);

topology.setBolt("twitter_filter", new TwitterFilterBolt(), 4)
        .shuffleGrouping("kafka_spout");

topology.setBolt("text_filter", new TextFilterBolt(), 4)
        .shuffleGrouping("twitter_filter");

topology.setBolt("stemming", new StemmingBolt(), 4)
        .shuffleGrouping("text_filter");

topology.setBolt("positive", new PositiveSentimentBolt(), 4)
        .shuffleGrouping("stemming");
topology.setBolt("negative", new NegativeSentimentBolt(), 4)
        .shuffleGrouping("stemming");

topology.setBolt("join", new JoinSentimentsBolt(), 4)
        .fieldsGrouping("positive", new Fields("tweet_id"))
        .fieldsGrouping("negative", new Fields("tweet_id"));

topology.setBolt("score", new SentimentScoringBolt(), 4)
        .shuffleGrouping("join");

topology.setBolt("hdfs", new HDFSBolt(), 4)
        .shuffleGrouping("score");
topology.setBolt("nodejs", new NodeNotifierBolt(), 4)
        .shuffleGrouping("score");

LOGGER.debug（“设置流”）；
PipedInputStream发送=新的PipedInputStream（缓冲区长度）；
PipedOutputStream输入=新的PipedOutputStream（发送）；
调试（“设置连接”）；
debug（“设置文件读取器”）；
BufferedFileReader=新的BufferedFileReader（文件名，输入）；
调试（“设置卡夫卡制作人”）；
卡夫卡制作人卡夫卡制作人=新卡夫卡制作人（主题，发送）；
调试（“旋转线程”）；
线程源=新线程（读卡器）；
线程kafka=新线程（kafkaProducer）；
source.start（）；
kafka.start（）；
调试（“加入”）；
kafka.join（）；
BufferedFileReader在其自己的线程中从磁盘读取数据：
rd=新的BufferedReader（新的文件读取器（this.fileToRead））；
wd=新的BufferedWriter（新的OutputStreamWriter（this.outputStream，ENC））；
int b=-1；
而（（b=rd.read（））！=-1）
{
wd.写入（b）；
}
最后，KafkaProducer向Kafka集群发送异步消息：
rd=新的BufferedReader（新的InputStreamReader（this.inputStream，ENC））；
字符串行=null；
生产者=新生产者（conf）；
而（（line=rd.readLine（））！=null）
{
send（新的KeyedMessage（this.topic，line））；
}
在单独的线程上执行这些操作可以使我们的好处是，磁盘读取不会阻塞Kafka生产者，反之亦然，从而可以通过缓冲区的大小来实现最大的性能可调。
实现Storm拓扑
拓扑定义
接下来是Storm，这里我们定义了拓扑结构以及每个螺栓之间的通信方式：
TopologyBuilder拓扑=新TopologyBuilder（）；
拓扑学：固定喷口（“卡夫卡喷口”，新卡夫卡喷口（卡夫卡诺夫），4）；
setBolt（“twitter\u过滤器”，新的TwitterFilterBolt（），4）
.洗牌组合（“卡夫卡喷口”）；
topology.setBolt（“text_filter”，新的TextFilterBolt（），4）
.shuffleGrouping（“twitter_过滤器”）；
topology.setBolt（“词干”，新词干螺栓（），4）
.shufflegroup（“文本过滤器”）；
topology.setBolt（“正极”，新正极指示螺栓（），4）
.洗牌组合（“堵塞”）；
topology.setBolt（“负”，新的负指示螺栓（），4）
.洗牌组合（“堵塞”）；
topology.setBolt（“join”，new-joinspolt（），4）
.Fields分组（“阳性”，新字段（“tweet_id”））
.Fields分组（“否定”，新字段（“tweet_id”）；
topology.setBolt（“score”，新螺栓（），4）
.洗牌组合（“加入”）；
topology.setBolt（“hdfs”，新的HDFSBolt（），4）
.洗牌组合（“分数”）；
topology.setBolt（“nodejs”，新的NodeNotifierBolt（），4）
.洗牌组合（“分数”）；

值得注意的是，除了加入时，数据会被拖移到每个螺栓，因为非常重要的是，相同的tweet会被提供给加入螺栓的相同实例

这也是我的问题，你以前的回答/问题中有什么让你不满意的地方，你需要问一个完全不同的答案/问题？或者，为什么这个问题已经足够不同了？前面的问题是一个更一般的问题，涉及到producer已经可以使用的脚本，但这里我尝试用python实现。此外，在这个问题上，我更具体地介绍了我使用过的工具和技术。

import pyshark
capture = pyshark.LiveCapture(interface='eth0')
capture.sniff(timeout=5)
capture1=capture[0]
print capture1

LOGGER.debug("Setting up streams");
PipedInputStream send = new PipedInputStream(BUFFER_LEN);
PipedOutputStream input = new PipedOutputStream(send);

LOGGER.debug("Setting up connections");
LOGGER.debug("Setting up file reader");
BufferedFileReader reader = new BufferedFileReader(filename, input);
LOGGER.debug("Setting up kafka producer");
KafkaProducer kafkaProducer = new KafkaProducer(topic, send);

LOGGER.debug("Spinning up threads");
Thread source = new Thread(reader);
Thread kafka = new Thread(kafkaProducer);

source.start();
kafka.start();

LOGGER.debug("Joining");
kafka.join();
The BufferedFileReader in its own thread reads off the data from disk:
rd = new BufferedReader(new FileReader(this.fileToRead));
wd = new BufferedWriter(new OutputStreamWriter(this.outputStream, ENC));
int b = -1;
while ((b = rd.read()) != -1)
{
    wd.write(b);
}
Finally, the KafkaProducer sends asynchronous messages to the Kafka Cluster:
rd = new BufferedReader(new InputStreamReader(this.inputStream, ENC));
String line = null;
producer = new Producer<Integer, String>(conf);
while ((line = rd.readLine()) != null)
{
    producer.send(new KeyedMessage<Integer, String>(this.topic, line));
}
Doing these operations on separate threads gives us the benefit of having disk reads not block the Kafka producer or vice-versa, enabling maximum performance tunable by the size of the buffer.
Implementing the Storm Topology
Topology Definition
Moving onward to Storm, here we define the topology and how each bolt will be talking to each other:
TopologyBuilder topology = new TopologyBuilder();

topology.setSpout("kafka_spout", new KafkaSpout(kafkaConf), 4);

topology.setBolt("twitter_filter", new TwitterFilterBolt(), 4)
        .shuffleGrouping("kafka_spout");

topology.setBolt("text_filter", new TextFilterBolt(), 4)
        .shuffleGrouping("twitter_filter");

topology.setBolt("stemming", new StemmingBolt(), 4)
        .shuffleGrouping("text_filter");

topology.setBolt("positive", new PositiveSentimentBolt(), 4)
        .shuffleGrouping("stemming");
topology.setBolt("negative", new NegativeSentimentBolt(), 4)
        .shuffleGrouping("stemming");

topology.setBolt("join", new JoinSentimentsBolt(), 4)
        .fieldsGrouping("positive", new Fields("tweet_id"))
        .fieldsGrouping("negative", new Fields("tweet_id"));

topology.setBolt("score", new SentimentScoringBolt(), 4)
        .shuffleGrouping("join");

topology.setBolt("hdfs", new HDFSBolt(), 4)
        .shuffleGrouping("score");
topology.setBolt("nodejs", new NodeNotifierBolt(), 4)
        .shuffleGrouping("score");