Scala 使用Spark流API测试Twitter_Scala_Apache Spark_Twitter_Spark Streaming

Scala 使用Spark流API测试Twitter

scala apache-spark twitter

Scala 使用Spark流API测试Twitter,scala,apache-spark,twitter,spark-streaming,Scala,Apache Spark,Twitter,Spark Streaming,我是Spark流媒体框架的新手，当时正在尝试处理twitter流。我正在编写测试用例，并了解我可以使用Spark StreamingSuiteBase，这将帮助我将输入测试为函数流。但我已经编写了一个函数，它以DStream[Status]作为输入，处理后将DStream[String]作为输出。我在StreamingSuite数据库中使用的api是testOperation test("Filter only words Starting with #") { val inputT

我是Spark流媒体框架的新手，当时正在尝试处理twitter流。我正在编写测试用例，并了解我可以使用Spark StreamingSuiteBase，这将帮助我将输入测试为函数流。但我已经编写了一个函数，它以DStream[Status]作为输入，处理后将DStream[String]作为输出。我在StreamingSuite数据库中使用的api是testOperation

test("Filter only words Starting with #")  {
  val inputTweet = List(List("this is #firstHash"), List("this is #secondHash"), List("this is #thirdHash"))
  val expected = List(List("#firstHash"), List("#secondHash"), List("#thirdHash"))

  testOperation(inputTweet, TransformTweets.getText _, expected, ordered = false)

这就是发送输入的函数

 def getText(englishTweets: DStream[Status]): DStream[String] = {
    println(englishTweets.toString)

    val hashTags = englishTweets.flatMap(x => x.getText.split(" ").filter(_.startsWith("#")))

    hashTags
  }

但是由于DStream[Status]和DStream[String]，我得到了错误“type mismatch”。如何模拟流[状态]。

因此，我通过从TwitterObjectFactory的“

createStatus

”API获取Twitter状态来解决这个问题。没有必要模仿推特状态。即使您设法模拟它，也存在序列化问题。因此，这是最好的解决方案：

val rawJson = Source.fromURL(getClass.getResource("/tweetStatus.json")).getLines.mkString
val tweetStatus = TwitterObjectFactory.createStatus(rawJson)

希望这对别人有帮助

为什么您的tweets是列表列表？您应该在状态列表上对flatMap操作进行单元测试，而不需要模拟数据流