Pyspark flatMapValues(整个列表、元素列表)

Pyspark flatMapValues(整个列表、元素列表),pyspark,rdd,flatmap,Pyspark,Rdd,Flatmap,我有一个键为整数的rdd。对于每个键,我都有一个字符串列表。示例:[(0,['transworld','systems','inc','trying','collect','debt','mine','owed','inc'])] 我想要的是得到一个新的RDD,如下所示: [(0, ['transworld', 'systems', 'inc', 'trying', 'collect', 'debt', 'mine', 'owed', 'inaccurate'],'transworld')] [

我有一个键为整数的rdd。对于每个键,我都有一个字符串列表。示例:
[(0,['transworld','systems','inc','trying','collect','debt','mine','owed','inc'])]

我想要的是得到一个新的RDD,如下所示:

[(0, ['transworld', 'systems', 'inc', 'trying', 'collect', 'debt', 'mine', 'owed', 'inaccurate'],'transworld')]
[(0, ['transworld', 'systems', 'inc', 'trying', 'collect', 'debt', 'mine', 'owed', 'inaccurate'],'systems')]
[(0, ['transworld', 'systems', 'inc', 'trying', 'collect', 'debt', 'mine', 'owed', 'inaccurate'],'inc')] etc

我想我需要flatMapValues,但找不到使用它的方法。有人帮忙吗?

也许这很有用-

  • 不确定用例2。用scala编写
  • val rdd=spark.sparkContext.parallelize(Seq((0,Seq)(“transworld”、“systems”、“inc”、“trying”、“collect”、“debt”),
    “我的”,
    “欠的”、“不准确的”))
    rdd.flatMap{case(i,seq)=>seq.fill(seq.length)((i,seq)).zip(seq.map(x=>(x._1._1,x._1._2,x._2))}
    .foreach(println)
    /**
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,欠款,不准确),transworld)
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,欠款,不准确),系统)
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,Owned,Accountable),inc)
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,欠款,不准确),trying)
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,欠款,不准确),collect)
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,欠款,不准确),debt)
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,欠款,不准确),mine)
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,Owned,Accountable),Owned)
    *(0,列表(transworld,systems,inc,trying,collect,debt,mine,欠款,不准确),不准确)
    */
    
    flatMap是一种方法:
    rdd.flatMap(λx:[x+(e,)表示x[1]]中的e)。collect()