Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/232.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Php 处理消息队列中的重复_Php_Message Queue_Cassandra_Activemq_Rabbitmq - Fatal编程技术网

Php 处理消息队列中的重复

Php 处理消息队列中的重复,php,message-queue,cassandra,activemq,rabbitmq,Php,Message Queue,Cassandra,Activemq,Rabbitmq,我一直在和我的程序员争论解决这个问题的最佳方法。我们有每秒10000个对象的数据。这需要异步处理,但松散的顺序就足够了,因此每个对象都会循环插入到多个消息队列中的一个(也有多个生产者和消费者)。每个对象大约有300字节。而且它需要持久,因此MQs被配置为持久存储到磁盘 问题是,这些对象通常是重复的(例如,它们不可避免地在进入生产者的数据中重复)。它们确实有10字节的唯一ID。如果对象在队列中重复,这并不是灾难性的,但如果对象在从队列中取出后在处理过程中重复,则是灾难性的。在确保对象处理过程中没有

我一直在和我的程序员争论解决这个问题的最佳方法。我们有每秒10000个对象的数据。这需要异步处理,但松散的顺序就足够了,因此每个对象都会循环插入到多个消息队列中的一个(也有多个生产者和消费者)。每个对象大约有300字节。而且它需要持久,因此MQs被配置为持久存储到磁盘

问题是,这些对象通常是重复的(例如,它们不可避免地在进入生产者的数据中重复)。它们确实有10字节的唯一ID。如果对象在队列中重复,这并不是灾难性的,但如果对象在从队列中取出后在处理过程中重复,则是灾难性的。在确保对象处理过程中没有重复的同时,确保尽可能接近线性可伸缩性的最佳方法是什么?也许与此相关,整个对象应该存储在消息队列中,还是只将主体id存储在cassandra之类的东西中

谢谢大家!


编辑:确认复制发生的位置。此外,到目前为止,我已经为Redis提出了2条建议。我以前一直在考虑拉比。关于我的需求,每种方法的优缺点是什么?

如果不知道消息是如何在系统中创建的,生产者用于发布到队列的机制,并且知道队列系统正在使用中,就很难诊断发生了什么

我见过这种情况以多种不同的方式发生;超时工作线程导致消息在队列中再次可见(并因此进行第二次处理,这在Kestrel中很常见)、配置错误的代理(想到HA ActiveMQ)、配置错误的客户端(想到Spring加骆驼路由)、客户端双重提交、,诸如此类的问题有很多种可能出现的方式

由于我无法真正诊断问题,我将在这里插入。您可以很容易地将类似(即O(1),如SADD)的东西与队列结合起来,以获得难以置信的快速、恒定时间、无重复(集合必须包含唯一元素)队列。虽然这是一个ruby项目,但可能会有所帮助。至少值得一看


祝你好运。

p.s:这是我有生以来第一次redis网站出现问题,但我敢打赌当你访问它时,他们已经解决了问题

> We have data that comes in at a rate
> of about 10000 objects per second.
> This needs to be processed
> asynchronously, but loose ordering is
> sufficient, so each object is inserted
> round-robin-ly into one of several
> message queues (there are also several
> producers and consumers)
我的第一个建议是看一下,因为它的速度快得离谱,而且我敢打赌,您只需一个消息队列就可以处理所有消息

首先,我想向您展示我的笔记本电脑的相关信息(我喜欢它,但大型服务器会快得多;)。我爸爸(有点印象深刻:)最近买了一台新电脑,它比我的笔记本电脑强(8个cpu,而不是2个)

在我的机器上使用
redis benchmark
进行基准测试,甚至没有进行太多redis优化:

alfred@alfred-laptop:~/database/redis-2.2.0-rc4/src$ ./redis-benchmark 
====== PING (inline) ======
  10000 requests completed in 0.22 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

94.84% <= 1 milliseconds
98.74% <= 2 milliseconds
99.65% <= 3 milliseconds
100.00% <= 4 milliseconds
46296.30 requests per second

====== PING ======
  10000 requests completed in 0.22 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

91.30% <= 1 milliseconds
98.21% <= 2 milliseconds
99.29% <= 3 milliseconds
99.52% <= 4 milliseconds
100.00% <= 4 milliseconds
45662.10 requests per second

====== MSET (10 keys) ======
  10000 requests completed in 0.32 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

3.45% <= 1 milliseconds
88.55% <= 2 milliseconds
97.86% <= 3 milliseconds
98.92% <= 4 milliseconds
99.80% <= 5 milliseconds
99.94% <= 6 milliseconds
99.95% <= 9 milliseconds
99.96% <= 10 milliseconds
100.00% <= 10 milliseconds
30864.20 requests per second

====== SET ======
  10000 requests completed in 0.21 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

92.45% <= 1 milliseconds
98.78% <= 2 milliseconds
99.00% <= 3 milliseconds
99.01% <= 4 milliseconds
99.53% <= 5 milliseconds
100.00% <= 5 milliseconds
47169.81 requests per second

====== GET ======
  10000 requests completed in 0.21 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

94.50% <= 1 milliseconds
98.21% <= 2 milliseconds
99.50% <= 3 milliseconds
100.00% <= 3 milliseconds
47619.05 requests per second

====== INCR ======
  10000 requests completed in 0.23 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

91.90% <= 1 milliseconds
97.45% <= 2 milliseconds
98.59% <= 3 milliseconds
99.51% <= 10 milliseconds
99.78% <= 11 milliseconds
100.00% <= 11 milliseconds
44444.45 requests per second

====== LPUSH ======
  10000 requests completed in 0.21 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

95.02% <= 1 milliseconds
98.51% <= 2 milliseconds
99.23% <= 3 milliseconds
99.51% <= 5 milliseconds
99.52% <= 6 milliseconds
100.00% <= 6 milliseconds
47619.05 requests per second

====== LPOP ======
  10000 requests completed in 0.21 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

95.89% <= 1 milliseconds
98.69% <= 2 milliseconds
98.96% <= 3 milliseconds
99.51% <= 5 milliseconds
99.98% <= 6 milliseconds
100.00% <= 6 milliseconds
47619.05 requests per second

====== SADD ======
  10000 requests completed in 0.22 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

91.08% <= 1 milliseconds
97.79% <= 2 milliseconds
98.61% <= 3 milliseconds
99.25% <= 4 milliseconds
99.51% <= 5 milliseconds
99.81% <= 6 milliseconds
100.00% <= 6 milliseconds
45454.55 requests per second

====== SPOP ======
  10000 requests completed in 0.22 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

91.88% <= 1 milliseconds
98.64% <= 2 milliseconds
99.09% <= 3 milliseconds
99.40% <= 4 milliseconds
99.48% <= 5 milliseconds
99.60% <= 6 milliseconds
99.98% <= 11 milliseconds
100.00% <= 11 milliseconds
46296.30 requests per second

====== LPUSH (again, in order to bench LRANGE) ======
  10000 requests completed in 0.23 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

91.00% <= 1 milliseconds
97.82% <= 2 milliseconds
99.01% <= 3 milliseconds
99.56% <= 4 milliseconds
99.73% <= 5 milliseconds
99.77% <= 7 milliseconds
100.00% <= 7 milliseconds
44247.79 requests per second

====== LRANGE (first 100 elements) ======
  10000 requests completed in 0.39 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

6.24% <= 1 milliseconds
75.78% <= 2 milliseconds
93.69% <= 3 milliseconds
97.29% <= 4 milliseconds
98.74% <= 5 milliseconds
99.45% <= 6 milliseconds
99.52% <= 7 milliseconds
99.93% <= 8 milliseconds
100.00% <= 8 milliseconds
25906.74 requests per second

====== LRANGE (first 300 elements) ======
  10000 requests completed in 0.78 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

1.30% <= 1 milliseconds
5.07% <= 2 milliseconds
36.42% <= 3 milliseconds
72.75% <= 4 milliseconds
93.26% <= 5 milliseconds
97.36% <= 6 milliseconds
98.72% <= 7 milliseconds
99.35% <= 8 milliseconds
100.00% <= 8 milliseconds
12886.60 requests per second

====== LRANGE (first 450 elements) ======
  10000 requests completed in 1.10 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

0.67% <= 1 milliseconds
3.64% <= 2 milliseconds
8.01% <= 3 milliseconds
23.59% <= 4 milliseconds
56.69% <= 5 milliseconds
76.34% <= 6 milliseconds
90.00% <= 7 milliseconds
96.92% <= 8 milliseconds
98.55% <= 9 milliseconds
99.06% <= 10 milliseconds
99.53% <= 11 milliseconds
100.00% <= 11 milliseconds
9066.18 requests per second

====== LRANGE (first 600 elements) ======
  10000 requests completed in 1.48 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

0.85% <= 1 milliseconds
9.23% <= 2 milliseconds
11.03% <= 3 milliseconds
15.94% <= 4 milliseconds
27.55% <= 5 milliseconds
41.10% <= 6 milliseconds
56.23% <= 7 milliseconds
78.41% <= 8 milliseconds
87.37% <= 9 milliseconds
92.81% <= 10 milliseconds
95.10% <= 11 milliseconds
97.03% <= 12 milliseconds
98.46% <= 13 milliseconds
99.05% <= 14 milliseconds
99.37% <= 15 milliseconds
99.40% <= 17 milliseconds
99.67% <= 18 milliseconds
99.81% <= 19 milliseconds
99.97% <= 20 milliseconds
100.00% <= 20 milliseconds
6752.19 requests per second
redis也可以保存到光盘

> The problem is that often these
> objects are duplicated. They do have
> 10-byte unique ids. It's not
> catastrophic if objects are duplicated
> in the queue, but it is if they're
> duplicated in the processing after
> being taken from the queue. What's the
> best way to go about ensuring as close
> as possible to linear scalability
> whilst ensuring there's no duplication
> in the processing of the objects?
使用单个消息队列(框)时,如果我理解正确,则不存在此问题。但是如果没有,你可以简单地检查一下id是否正确。当您处理id时,您应该。首先,您当然应该使用将成员添加到列表中

如果一个框不再缩放,则应在多个框上切分关键点,并选中该框上的关键点。要了解更多信息,我认为您应该阅读以下链接:

  • 也许与此相关的是 整个对象将存储在消息中 队列,或仅包含正文的id 储存在类似卡桑德拉的东西里

如果可能的话,您应该将所有信息直接存储到内存中,因为没有任何东西可以像内存那样运行(好吧,您的内存更快,但非常小,而且您不能通过代码访问)。Redis会将您的所有信息存储在内存中,并在光盘上制作快照。我认为您应该能够将所有信息存储在内存中,而不必使用Cassandra之类的工具


让我们考虑每一个对象每一个对象总共有400个字节,每秒10000个比特=4000000个字节,对于每一个对象,如果计算正确,则每秒超过4 Mb/s。你可以很容易地将这些信息存储在你的内存中。如果你不能,你应该真的考虑升级你的内存,如果可能的话,因为内存不再那么贵了。

< P>如果你不介意投入混合,那么你可以使用EIP来帮助这个。< /P>
此外,还可以用于对相关消息进行分组,使它们更容易执行重复检查,并且仍然保持高吞吐量等。

Redis是非常快速的开源高级键值存储。它通常被称为数据结构服务器,因为键可以包含字符串、哈希、列表、集和排序集。Redis也有(非常)活跃的开发,如果你问我的话,它使用起来很有趣。我从未玩过RabbitMQ。我将添加另一个Redis插件。我最近开发了一种新的存储方法,它最初是从另一种存储方法(未命名)开始的。在项目进行到一半时,我决定改用Redis。我发现它速度非常快,非常可靠,易于学习,并且对于构建基本和复杂的消息传递系统都具有非常有用的功能。感谢您对redis@bmatheny的建议!只是检查一下。。通过结合SPOP和pub/sub,您是指制作人添加到集合中,然后发布给消费者,通知他们添加内容吗?另外,如果我将ID添加到集合中,那么传递其余对象数据的最佳方式是什么?取决于您希望如何实现它。如果您不介意拥有NOOP消费者,您可以只为消费者/生产者使用SPOP和SADD(消费者没有工作,这是一个民意调查模型)。您也可以使用BLPOP(它将一直阻止,直到数据可用),但这仅适用于列表,它可以
> And it needs to be durable, so the MQs
> are configured to persist to disk.
> The problem is that often these
> objects are duplicated. They do have
> 10-byte unique ids. It's not
> catastrophic if objects are duplicated
> in the queue, but it is if they're
> duplicated in the processing after
> being taken from the queue. What's the
> best way to go about ensuring as close
> as possible to linear scalability
> whilst ensuring there's no duplication
> in the processing of the objects?