Python 在PySpark中将整型列转换为字符串IP_Python_Apache Spark_Pyspark

Python 在PySpark中将整型列转换为字符串IP

python apache-spark pyspark

Python 在PySpark中将整型列转换为字符串IP,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我有一个pyspark数据帧，IPv4值是整数，我想将它们转换成字符串形式。最好没有可能对性能有很大影响的UDF 输入示例： +---------------+ | IP_int| +---------------+ | 67633643| | 839977746| | 812147536| +---------------+ 示例输出： +---------------+ | IP_str| +---------------+

我有一个pyspark数据帧，IPv4值是整数，我想将它们转换成字符串形式。最好没有可能对性能有很大影响的UDF

输入示例：

+---------------+
|         IP_int|
+---------------+
|       67633643|
|      839977746|
|      812147536|
+---------------+

示例输出：

+---------------+
|         IP_str|
+---------------+
|      4.8.1.235|
|    50.17.11.18|
|   48.104.99.80|
+---------------+

此代码将ip从整数转换为字符串：

ip\u str\u col=f.concat\u ws(
".",
（（f.col（“IP_int”）/16777216.cast（“int”）%256.cast（“string”），
（（f.col（“IP_int”）/65536.cast（“int”）%256.cast（“string”），
（（f.col（“IP_int”）/256）.cast（“int”）%256.cast（“string”），
（f.col（“IP_int”）.cast（“int”）%256）。cast（“string”），
)
df=df.withColumn（“IP_str”，IP_str_col）
df.show（）

产出：

+---------+------------+
|   IP_int|      IP_str|
+---------+------------+
| 67633643|   4.8.1.235|
|839977746| 50.17.11.18|
|812147536|48.104.99.80|
+---------+------------+

此代码将ip从整数转换为字符串：

ip\u str\u col=f.concat\u ws(
".",
（（f.col（“IP_int”）/16777216.cast（“int”）%256.cast（“string”），
（（f.col（“IP_int”）/65536.cast（“int”）%256.cast（“string”），
（（f.col（“IP_int”）/256）.cast（“int”）%256.cast（“string”），
（f.col（“IP_int”）.cast（“int”）%256）。cast（“string”），
)
df=df.withColumn（“IP_str”，IP_str_col）
df.show（）

产出：

+---------+------------+
|   IP_int|      IP_str|
+---------+------------+
| 67633643|   4.8.1.235|
|839977746| 50.17.11.18|
|812147536|48.104.99.80|
+---------+------------+

使用

conv

将其转换为十六进制，使用

子字符串将其拆分为4段，使用conv
将其转换回十进制，并使用concat\ws
将其连接起来
从pyspark.sql导入函数为F
df=df.withColumn（“hex”，F.lpad（F.conv（“IP_int”，10,16），8,0”））
选择(
“IP_int”，
F.concat_-ws(
".",
F.conv（F.substring（“十六进制”，1,2），16,10），
F.conv（F.substring（“hex”，3,2），16,10），
F.conv（F.substring（“hex”，5,2），16,10），
F.conv（F.substring（“hex”，7,2），16,10），
).别名（“IP_街”），
).show（）
+---------+------------+
|IP|u int | IP|u str|
+---------+------------+
| 67633643|   4.8.1.235|
|839977746| 50.17.11.18|
|812147536|48.104.99.80|
+---------+------------+


编辑：使用位移位运算符
df=df.withColumn(
"IP_str",，
F.concat_-ws(
".",
（F.shiftRight（“IP_int”，8*3）%256）。强制转换（“字符串”），
（F.shiftRight（“IP_int”，8*2）%256）。强制转换（“字符串”），
（F.shiftRight（“IP_int”，8）%256）.转换（“字符串”），
（F.col（“IP_int”）%256）.cast（“string”），
),
)
使用conv
将其转换为十六进制，使用子字符串将其拆分为4段，使用conv
将其转换回十进制，并使用concat\ws
将其连接起来
从pyspark.sql导入函数为F
df=df.withColumn（“hex”，F.lpad（F.conv（“IP_int”，10,16），8,0”））
选择(
“IP_int”，
F.concat_-ws(
".",
F.conv（F.substring（“十六进制”，1,2），16,10），
F.conv（F.substring（“hex”，3,2），16,10），
F.conv（F.substring（“hex”，5,2），16,10），
F.conv（F.substring（“hex”，7,2），16,10），
).别名（“IP_街”），
).show（）
+---------+------------+
|IP|u int | IP|u str|
+---------+------------+
| 67633643|   4.8.1.235|
|839977746| 50.17.11.18|
|812147536|48.104.99.80|
+---------+------------+


编辑：使用位移位运算符
df=df.withColumn(
"IP_str",，
F.concat_-ws(
".",
（F.shiftRight（“IP_int”，8*3）%256）。强制转换（“字符串”），
（F.shiftRight（“IP_int”，8*2）%256）。强制转换（“字符串”），
（F.shiftRight（“IP_int”，8）%256）.转换（“字符串”），
（F.col（“IP_int”）%256）.cast（“string”），
),
)
您能显示您的代码或告诉我们错误在哪里吗？您能显示您的代码或告诉我们错误在哪里吗？比我的解决方案优雅得多。这是正确的答案。@DavidTaub你的也很优雅。而且可能更快。只需将您的concat
替换为concat_ws
@DavidTaub使用按位移位添加的新版本。可能是最好的解决方案。比我的解决方案优雅得多。这是正确的答案。@DavidTaub你的也很优雅。而且可能更快。只需将您的concat
替换为concat_ws
@DavidTaub使用按位移位添加的新版本。也许是最好的解决办法。