Scala:向现有数据框添加新列时出错?

Scala:向现有数据框添加新列时出错?,scala,apache-spark,dataframe,emr,apache-zeppelin,Scala,Apache Spark,Dataframe,Emr,Apache Zeppelin,我有一个数据帧:df |---itemId----|----Country------------| | 11 | US | | 13 | France | | 101 | Fra nce | 如何在同一数据框中添加列值: |---itemI

我有一个数据帧:df

          |---itemId----|----Country------------|
          |     11      |     US                |
          |     13      |     France            | 
          |     101     |     Fra nce           |   
如何在同一数据框中添加列值:

          |---itemId----|----Country------------|----Type-----|
          |     11      |     US                |    NA       |  
          |     13      |     France            |    EU       |  
          |     101     |     France            |    EU       |
我试过:

df: org.apache.spark.sql.DataFrame = [itemId: string,  Country: string]

testMap: scala.collection.Map[String,com.model.PeopleInfo] 

val peopleMap = sc.broadcast(testMap)

val getTypeFunc : (String => String) = (country: String) => {
    if (StringUtils.isNotBlank(peopleMap.value(itemId).getType)) {
       peopleMap.value(itemId).getType
    }
"Unknown Type"
  }

    val typefunc = udf(getTypeFunc)

val newDF = df.withColumn("Type",typefunc(col("Country")))
但我不断地犯错误:

org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:220) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:205) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:211) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:207) at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:304) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
我将spark 1.6与EMR EMR-4.3.0齐柏林飞艇沙盒0.5.5配合使用:

Cluster size = 30 type = r3.8Xlarge

spark.executor.instances         170
spark.executor.cores             5
spark.driver.memory              219695M
spark.yarn.driver.memoryOverhead 21969
spark.executor.memory            38G
spark.yarn.executor.memoryOverhead 21969
spark.default.parallelism        1856
spark.kryoserializer.buffer.max  512m
spark.sql.hive.convertMetastoreParquet false
spark.hadoop.mapreduce.input.fileinputformat.split.maxsize 33554432
我做错什么了吗