Python 2.7 Spark流媒体程序未写入文本文件
我写了一个spark流媒体应用程序,我试图处理来自卡夫卡主题的数据,然后将处理后的数据写入文本文件,但程序什么都不做 守则:Python 2.7 Spark流媒体程序未写入文本文件,python-2.7,apache-spark,pyspark,plotly,plotly-dash,Python 2.7,Apache Spark,Pyspark,Plotly,Plotly Dash,我写了一个spark流媒体应用程序,我试图处理来自卡夫卡主题的数据,然后将处理后的数据写入文本文件,但程序什么都不做 守则: from __future__ import (absolute_import, division, print_function, unicode_literals) from future.builtins import * # NOQA import dash from dash.dependencies impor
from __future__ import (absolute_import, division, print_function,
unicode_literals)
from future.builtins import * # NOQA
import dash
from dash.dependencies import Output, Event
import dash_core_components as dcc
import dash_html_components as html
import time
import plotly
import plotly.graph_objs as go
from collections import deque
import sys
from operator import add
import numpy as np
from itertools import chain
import warnings
from obspy import UTCDateTime
from obspy.signal.cross_correlation import templates_max_similarity
from obspy.signal.headers import clibsignal, head_stalta_t
from obspy import read
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
def classic_sta_lta_py(a):
"""
Computes the standard STA/LTA from a given input array a. The length of
the STA is given by nsta in samples, respectively is the length of the
LTA given by nlta in samples. Written in Python.
.. note::
There exists a faster version of this trigger wrapped in C
called :func:`~obspy.signal.trigger.classic_sta_lta` in this module!
:type a: NumPy :class:`~numpy.ndarray`
:param a: Seismic Trace
:type nsta: int
:param nsta: Length of short time average window in samples
:type nlta: int
:param nlta: Length of long time average window in samples
:rtype: NumPy :class:`~numpy.ndarray`
:return: Characteristic function of classic STA/LTA
"""
# The cumulative sum can be exploited to calculate a moving average (the
# cumsum function is quite efficient)
nsta = 2
nlta = 20
sta = np.cumsum(a ** 2)
# Convert to float
sta = np.require(sta, dtype=np.float)
# Copy for LTA
lta = sta.copy()
# Compute the STA and the LTA
sta[nsta:] = sta[nsta:] - sta[:-nsta]
sta /= nsta
lta[nlta:] = lta[nlta:] - lta[:-nlta]
lta /= nlta
# Pad zeros
sta[:nlta - 1] = 0
# Avoid division by zero by setting zero values to tiny float
dtiny = np.finfo(0.0).tiny
idx = lta < dtiny
lta[idx] = dtiny
return sta / lta
def saveRec(rdd):
rdd.foreach(lambda rec: open("/Users/zeinab/kafka_2.11-1.1.0/outputFile7.txt", "a").write(rec+"\n"))
app = dash.Dash(__name__)
# Read data
max_length = 50
X = deque(maxlen=max_length)
X.append(0)
Y = deque(maxlen=max_length)
text_file = open("/Users/zeinab/kafka_2.11-1.1.0/outputFile7.txt", "r")
lines = text_file.readlines()
a = []
for l in lines:
a.append(float(l))
app.layout = html.Div(
[
dcc.Graph(id='live-graph', animate=True),
dcc.Interval(
id='graph-update',
interval=1*1000
)
]
)
@app.callback(Output('live-graph', 'figure'),
events=[Event('graph-update', 'interval')])
def update_graph_scatter():
#times.append(time.time())
X.append(X[-1]+1)
Y.append(a[0])
del a[0]
data = plotly.graph_objs.Scatter(
x=list(X),
y=list(Y),
name='Scatter',
mode= 'lines+markers'
)
return {'data': [data],'layout' : go.Layout(xaxis=dict(range=[min(X),max(X)]),
yaxis=dict(range=[min(Y),max(Y)]))}
if __name__ == "__main__":
print("hello")
sc = SparkContext(appName="STALTA")
ssc = StreamingContext(sc, 5)
broker, topic = sys.argv[1:]
# Connect to Kafka
kvs = KafkaUtils.createStream(ssc, broker, "raw-event-streaming-consumer",{topic:1})
lines = kvs.map(lambda x: x[1])
ds = lines.flatMap(lambda line: line.strip().split("\n")).map(lambda strelem: float(strelem))
mapped = ds.mapPartitions(lambda i: classic_sta_lta_py(np.array(list(i))))
lines2 = mapped.map(lambda y: y)
mapped2 = lines2.map(lambda w: str(w))
mapped2.foreachRDD(saveRec)
ssc.start()
ssc.awaitTermination()
app.run_server(debug=True)
来自未来导入(绝对导入、分割、打印功能、,
unicode(字符)
来自future.builtins import*#NOQA
导入破折号
从dash.dependencies导入输出,事件
将仪表板核心组件作为dcc导入
将dash_html_组件导入为html
导入时间
绘声绘色地导入
导入plotly.graph_objs作为go
从集合导入deque
导入系统
从操作员导入添加
将numpy作为np导入
来自itertools进口链
进口警告
从obspy导入UTCDateTime
从obspy.signal.cross\u correlation导入模板\u max\u相似度
从obspy.signal.headers导入clibsignal,head\u stalta\t
从obspy导入读取
从pyspark.sql导入SparkSession
从pyspark导入SparkContext
从pyspark.streaming导入StreamingContext
从pyspark.streaming.kafka导入KafkaUtils
def classic__lta_py(a):
"""
从给定的输入数组a计算标准STA/LTA
STA由nsta在样本中给出,分别是样本的长度
nlta在示例中给出的LTA。用Python编写。
…注::
这个触发器有一个更快的版本,用C编写
在这个模块中调用:func:`~obspy.signal.trigger.classic_sta_lta`!
:类型a:NumPy:class:`~NumPy.ndarray`
:参数a:地震记录道
:类型nsta:int
:param nsta:样本中的短时平均窗口长度
:类型nlta:int
:param nlta:样本中长时间平均窗口的长度
:rtype:NumPy:class:`~NumPy.ndarray`
:return:经典STA/LTA的特征函数
"""
#可利用累积总和计算移动平均值(移动平均值)
#累积和函数非常有效)
nsta=2
nlta=20
sta=np.cumsum(a**2)
#转换为浮点数
sta=np.require(sta,dtype=np.float)
#LTA副本
lta=sta.copy()
#计算STA和LTA
sta[nsta:]=sta[nsta:]-sta[:-nsta]
sta/=nsta
lta[nlta:]=lta[nlta:]-lta[:-nlta]
lta/=nlta
#补零
sta[:nlta-1]=0
#通过将零值设置为微小浮点,避免被零除
dtiny=np.finfo(0.0).tiny
idx=lta
代码在没有Dash应用程序的情况下运行良好,但由于我也需要可视化处理的数据,我添加了Dash应用程序,但它不工作。
有什么想法吗
谢谢。您使用了多少个内核?Spark Streaming至少需要两个内核,一个用于接收器,一个用于处理器。@Ahmed:我将SparkContext更改为:
sc=SparkContext(“local[2]”,appName=“STALTA”)
但程序仍在运行,只是在运行。