Python 使用pytest测试Spark时Unicode相等比较失败

Python 使用pytest测试Spark时Unicode相等比较失败,python,apache-spark,unicode,pyspark,pytest,Python,Apache Spark,Unicode,Pyspark,Pytest,我正在测试Spark函数,它使用pytest创建RDD。函数如下所示: def joinDMS(dms,df): return dms.join(df,(df.mac == dms.mac) & (df.ch == dms.ch),"right_outer") def test_joinDMS(spark_context, hive_context): input_rdd_data = [ ["0004", 46] ] input_rdd

我正在测试Spark函数,它使用pytest创建RDD。函数如下所示:

def joinDMS(dms,df):
    return dms.join(df,(df.mac == dms.mac) & (df.ch == dms.ch),"right_outer")
def test_joinDMS(spark_context, hive_context):
    input_rdd_data = [
        ["0004", 46]
    ]
    input_rdd = spark_context.parallelize(input_rdd_data)
    df_input = hive_context.createDataFrame(input_rdd, ['mac', 'ch'])

    input_dms_data = [
        ["0004", "gotv", 46]
    ]
    input_dms = spark_context.parallelize(input_dms_data)
    df_dms = hive_context.createDataFrame(input_dms, ['mac', 'tech', 'ch'])

    expected_results = [
        ["0004", "gotv", 46, "0004", 46]
    ]
    input_exp_results = spark_context.parallelize(expected_results)
    df_exp_results = hive_context.createDataFrame(input_exp_results, ['mac', 'tech', 'ch', 'mac', 'ch'])

    results = tv_functions.joinDMS(df_dms, df_input)
    assert results == df_exp_results
我的测试函数如下所示:

def joinDMS(dms,df):
    return dms.join(df,(df.mac == dms.mac) & (df.ch == dms.ch),"right_outer")
def test_joinDMS(spark_context, hive_context):
    input_rdd_data = [
        ["0004", 46]
    ]
    input_rdd = spark_context.parallelize(input_rdd_data)
    df_input = hive_context.createDataFrame(input_rdd, ['mac', 'ch'])

    input_dms_data = [
        ["0004", "gotv", 46]
    ]
    input_dms = spark_context.parallelize(input_dms_data)
    df_dms = hive_context.createDataFrame(input_dms, ['mac', 'tech', 'ch'])

    expected_results = [
        ["0004", "gotv", 46, "0004", 46]
    ]
    input_exp_results = spark_context.parallelize(expected_results)
    df_exp_results = hive_context.createDataFrame(input_exp_results, ['mac', 'tech', 'ch', 'mac', 'ch'])

    results = tv_functions.joinDMS(df_dms, df_input)
    assert results == df_exp_results
数据帧
结果
df_exp_结果
相同,但测试状态为“失败”。这就是错误:

---------------------- Captured stderr call -----------------------
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
/usr/hdp/2.5.0.0-1245/spark/python/lib/pyspark.zip/pyspark/worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
有人知道这个问题的原因吗?我想可能是字符串的问题,但不确定