Pyspark Spark HiveContext使用sql方法在操作中应用
我有一个列名为Department、City的employee配置单元表,我想根据HiveContext.sql()函数中使用IN操作的员工的姓名检索数据,但它引发pyspark.Analysis异常。请查看下面的示例 员工表:Pyspark Spark HiveContext使用sql方法在操作中应用,pyspark,hive,apache-spark-sql,hivecontext,Pyspark,Hive,Apache Spark Sql,Hivecontext,我有一个列名为Department、City的employee配置单元表,我想根据HiveContext.sql()函数中使用IN操作的员工的姓名检索数据,但它引发pyspark.Analysis异常。请查看下面的示例 员工表: Name Department City Ram FDE Mumbai Ramesh CTZ Pune Suraj FDE Chennai Varun CTZ Delhi 查询: SELE
Name Department City
Ram FDE Mumbai
Ramesh CTZ Pune
Suraj FDE Chennai
Varun CTZ Delhi
查询:
SELECT * from employee WHERE Name in ('Ramesh' , 'Varun')
spark程序的代码段:
namesList= ['Ramesh' , 'Varun']
data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = namesList))
我试图修改并传递字符串而不是列表,但错误仍然是一样的
Error:pyspark.AnalysisException : structType field
请在这方面帮助我,如果我在这里做错了什么,请建议我。在创建查询时,您应该去掉python列表中的方括号-
str(namesList)[1:-1]
data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = str(namesList)[1:-1]))
在创建查询-
str(namesList)[1:-1]
data = HiveContext.sql('SELECT * from employee WHERE Name in ({namesList})'.format(namesList = str(namesList)[1:-1]))
替换
这个
用这个
data = HiveContext.sql("SELECT * from employee WHERE Name in ({namesList})".format(namesList = "'"+"','".join(namesList)+"'"))
您需要传递字符串not list。替换
这个
用这个
data = HiveContext.sql("SELECT * from employee WHERE Name in ({namesList})".format(namesList = "'"+"','".join(namesList)+"'"))
您需要传递字符串而不是列表