Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在foreachpartition中获取行属性值_Python_Python 2.7_Apache Spark_Pyspark - Fatal编程技术网

Python 如何在foreachpartition中获取行属性值

Python 如何在foreachpartition中获取行属性值,python,python-2.7,apache-spark,pyspark,Python,Python 2.7,Apache Spark,Pyspark,我正在努力 def customFunction(rows): for row in rows: key = row.key #this value is boolean instead of actual value same with row["key"] val = row.value #this value is boolean instead of actual value same with row["val"] #do som

我正在努力

def customFunction(rows):
    for row in rows:
        key = row.key #this value is boolean instead of actual value same with row["key"]
        val = row.value #this value is boolean instead of actual value same with row["val"]
        #do something with key value

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL Hive integration example") \
    .config("spark.sql.warehouse.dir", warehouse_location) \
    .enableHiveSupport() \
    .getOrCreate()


# spark is an existing SparkSession
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")

# Queries are expressed in HiveQL
df = spark.sql("SELECT key, value FROM src")

# assumption that df row size is of billions
df.rdd.foreachPartition(customFunction)

我在键中得到booelan值,在自定义函数中得到val变量。如何获取行属性的实际值


这是在aws emr 5.29、python 2.7上运行的,python代码通过spark submit执行

如果这有帮助,在
customFunction
内部,我试图将value
key
用于dynamodb,它在key为NULL时存储一个布尔值。

是的,值得否决。我所尝试的是pyspark,相当于