Python 使用pytest测试Spark时Unicode相等比较失败
我正在测试Spark函数,它使用pytest创建RDD。函数如下所示:Python 使用pytest测试Spark时Unicode相等比较失败,python,apache-spark,unicode,pyspark,pytest,Python,Apache Spark,Unicode,Pyspark,Pytest,我正在测试Spark函数,它使用pytest创建RDD。函数如下所示: def joinDMS(dms,df): return dms.join(df,(df.mac == dms.mac) & (df.ch == dms.ch),"right_outer") def test_joinDMS(spark_context, hive_context): input_rdd_data = [ ["0004", 46] ] input_rdd
def joinDMS(dms,df):
return dms.join(df,(df.mac == dms.mac) & (df.ch == dms.ch),"right_outer")
def test_joinDMS(spark_context, hive_context):
input_rdd_data = [
["0004", 46]
]
input_rdd = spark_context.parallelize(input_rdd_data)
df_input = hive_context.createDataFrame(input_rdd, ['mac', 'ch'])
input_dms_data = [
["0004", "gotv", 46]
]
input_dms = spark_context.parallelize(input_dms_data)
df_dms = hive_context.createDataFrame(input_dms, ['mac', 'tech', 'ch'])
expected_results = [
["0004", "gotv", 46, "0004", 46]
]
input_exp_results = spark_context.parallelize(expected_results)
df_exp_results = hive_context.createDataFrame(input_exp_results, ['mac', 'tech', 'ch', 'mac', 'ch'])
results = tv_functions.joinDMS(df_dms, df_input)
assert results == df_exp_results
我的测试函数如下所示:
def joinDMS(dms,df):
return dms.join(df,(df.mac == dms.mac) & (df.ch == dms.ch),"right_outer")
def test_joinDMS(spark_context, hive_context):
input_rdd_data = [
["0004", 46]
]
input_rdd = spark_context.parallelize(input_rdd_data)
df_input = hive_context.createDataFrame(input_rdd, ['mac', 'ch'])
input_dms_data = [
["0004", "gotv", 46]
]
input_dms = spark_context.parallelize(input_dms_data)
df_dms = hive_context.createDataFrame(input_dms, ['mac', 'tech', 'ch'])
expected_results = [
["0004", "gotv", 46, "0004", 46]
]
input_exp_results = spark_context.parallelize(expected_results)
df_exp_results = hive_context.createDataFrame(input_exp_results, ['mac', 'tech', 'ch', 'mac', 'ch'])
results = tv_functions.joinDMS(df_dms, df_input)
assert results == df_exp_results
数据帧结果
和df_exp_结果
相同,但测试状态为“失败”。这就是错误:
---------------------- Captured stderr call -----------------------
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
有人知道这个问题的原因吗?我想可能是字符串的问题,但不确定