Python GCP-unicode字符串中的PySpark内核_Python_Google Cloud Platform_Pyspark

Python GCP-unicode字符串中的PySpark内核

python google-cloud-platform pyspark

Python GCP-unicode字符串中的PySpark内核,python,google-cloud-platform,pyspark,Python,Google Cloud Platform,Pyspark,我有一个包含字符串的列的数据帧。当我调用函数时： df = spark.read.csv(path, header=True).show() 然而，当我打印时，我得到了正确的“视图” print("dataframe as a RDD object (list of Row objects):\n\t", df.collect()) 结果是带有unicode符号的字符串，如u'mystring' 我如何在Python2.x中解决这个问题，你有str和unicode。u'mytext'环绕的对

我有一个包含字符串的列的数据帧。当我调用函数时：

df = spark.read.csv(path, header=True).show()

然而，当我打印时，我得到了正确的“视图”

print("dataframe as a RDD object (list of Row objects):\n\t", df.collect())

结果是带有unicode符号的字符串，如

u'mystring'

我如何在Python2.x中解决这个问题，你有

str

和

unicode

。

u'mytext'

环绕的对象是Unicode

要将unicode转换为str，请执行以下操作：

mystr = unistr.encode('utf-8')

要将str转换为unicode，请执行以下操作：

unistr = mystr.decode('utf-8')

在Python2.x中，我通常将字符串保留为Unicode，直到我需要将它们写入文件，等等。在Python3.x中，所有字符串都是Unicode

以下文件将有助于理解：