Apache storm 以编程方式获取Storm拓扑统计信息
我正在围绕我的Storm拓扑构建一个监控服务,并希望能够获得不同时间窗口的失败元组数,类似于Storm UI在10m、3h和1d窗口中显示失败元组数的方式 我的监控服务目前是用python构建的,因此如果答案涉及python库或与语言无关的东西,如使用CLI或使用REST端点,我将不胜感激。我查看了Storm CLI和文档,但到目前为止,对于Storm UI实际从何处获取信息,我还是空手而归 编辑: -运行storm版本0.8.2(不幸的是我无法控制),因此在升级之前,storm ui rest api(在0.9.2中发布)不幸不是一个选项 利用风暴 {“Topologys”:[{“id”:“topology-1-143604781”,“encodedId”:“topology-1-143604781”,“encodedId”,“name”:“topology-1”,“status”:“ACTIVE”,“uptime”:“40d” 21h 51m 59s,“任务总数”:16,“工人总数”:1,“执行者总数”:10}] {“msgTimeout”:30,“spouts”:[{“executors”:3,“emissed”:22336820,“errorLapsedSecs”:755996,“completeLatency”:“232.052”,“transferred”:22336820,“acked”:22340300,“errorPort”:6703,“spoutId”:“KafkaSpout已删除”,“tasks”:3,“errorHost”:“已删除”,“lastError”:“java.lang.RuntimeException: java.lang.NullPointerException\n\tat backtype.storm.utils.DisruptorQueue.ConsumerBatchToCursor(DisruptorQueue.java:128)\n\t backtype.storm.utils.DisruptorQueue.ConsumerBatch(Di),“errorWorkerLogLink”:http://host:port/log?file=worker-6703.log,“failed”:0,“encodedSpoutId”:“KafkaSpout removed”}],“executorsTotal”:8,“正常运行时间”:“67d” 21小时15米 2s,“encodedId”:“topology-1-143604781”,“visualizationTable”:[{”:行:“{”:流:“{”:sani流:“'default1544803905”,“:checked:true},{”:流:“'uuuuuu ack_init”,“:sani流:“'s_uuack_uinit973324006”,“:checked:false},{”:流:“'uuuuu ack_uack_uack”,“:sani流:“:'s_uuuuuuack ack ack”,“,“:15507”,“:checked:sanu fail” …删除Apache storm 以编程方式获取Storm拓扑统计信息,apache-storm,Apache Storm,我正在围绕我的Storm拓扑构建一个监控服务,并希望能够获得不同时间窗口的失败元组数,类似于Storm UI在10m、3h和1d窗口中显示失败元组数的方式 我的监控服务目前是用python构建的,因此如果答案涉及python库或与语言无关的东西,如使用CLI或使用REST端点,我将不胜感激。我查看了Storm CLI和文档,但到目前为止,对于Storm UI实际从何处获取信息,我还是空手而归 编辑: -运行storm版本0.8.2(不幸的是我无法控制),因此在升级之前,storm ui rest
如您所见,您甚至可以捕获螺栓/喷嘴中发生的最后一个错误。我使用python获取此信息,如果“失败”太高,将重新启动拓扑
pid = urllib2.urlopen('http://'+host+':'+port+'/api/v1/topology/summary').read()
data_pid = json.loads(pid)
for data in data_pid['topologies']:
if data['name'] == '':
print 'no topology'
break
elif data['name'] == topology_name:
url_pid = data['id'].encode("UTF-8")
break
content = urllib2.urlopen('http://'+host+':'+port+'/api/v1/topology/'+url_pid).read()
data_content = json.loads(content)
if data_content['topologyStats'][0]['failed'] == None:
data_content['topologyStats'][0]['failed'] = 0
if data_content['topologyStats'][0]['acked'] == None:
data_content['topologyStats'][0]['acked'] = 0
if data_content['topologyStats'][0]['acked'] < data_content['topologyStats'][0]['failed']*10:
global count
count = count + 1
if count == 2:
os.system("monit restart "+ monitor_name)
logger.info('restart at '+ time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())))
count = 0
pid=urllib2.urlopen('http://'+host+':'+port+'/api/v1/topology/summary')。read()
data_pid=json.loads(pid)
对于data_pid[‘拓扑’]中的数据:
如果数据['name']='':
打印“无拓扑”
打破
elif数据['name']==拓扑结构\u名称:
url_pid=数据['id']。编码(“UTF-8”)
打破
content=urllib2.urlopen('http://'+host+':'+port+'/api/v1/topology/'+url\u-pid).read()
data_content=json.load(内容)
如果数据内容['topologyStats'][0]['failed']==无:
数据内容['topologyStats'][0]['failed']=0
如果数据内容['topologyStats'][0]['acked']==无:
数据内容['topologyStats'][0]['acked']=0
如果数据内容['topologyStats'][0]['acked']<数据内容['topologyStats'][0]['failed']*10:
全局计数
计数=计数+1
如果计数=2:
操作系统(“监视器重新启动”+监视器名称)
logger.info('在'+time.strftime('%Y-%m-%d%H:%m:%S',time.localtime(time.time())重新启动)
计数=0
如果你想知道更多,
这是一个很好的答案,但是,它看起来只在Storm 0.9.2+上可用,不幸的是,我们仍然只在0.8.2上。是的:/不幸的是,这些东西不在我的控制范围内,不幸的是,无法从master获得
sqlInjection@foo:~$ curl http://$STORM_UI_HOST_AND_PORT/api/v1/topology/topology-1-1436004781
pid = urllib2.urlopen('http://'+host+':'+port+'/api/v1/topology/summary').read()
data_pid = json.loads(pid)
for data in data_pid['topologies']:
if data['name'] == '':
print 'no topology'
break
elif data['name'] == topology_name:
url_pid = data['id'].encode("UTF-8")
break
content = urllib2.urlopen('http://'+host+':'+port+'/api/v1/topology/'+url_pid).read()
data_content = json.loads(content)
if data_content['topologyStats'][0]['failed'] == None:
data_content['topologyStats'][0]['failed'] = 0
if data_content['topologyStats'][0]['acked'] == None:
data_content['topologyStats'][0]['acked'] = 0
if data_content['topologyStats'][0]['acked'] < data_content['topologyStats'][0]['failed']*10:
global count
count = count + 1
if count == 2:
os.system("monit restart "+ monitor_name)
logger.info('restart at '+ time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())))
count = 0