Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 Python spark从数据帧中提取字符_Python 2.7_Apache Spark_Pyspark - Fatal编程技术网

Python 2.7 Python spark从数据帧中提取字符

Python 2.7 Python spark从数据帧中提取字符,python-2.7,apache-spark,pyspark,Python 2.7,Apache Spark,Pyspark,我在spark中有一个数据帧,类似这样: ID | Column ------ | ---- 1 | STRINGOFLETTERS 2 | SOMEOTHERCHARACTERS 3 | ANOTHERSTRING 4 | EXAMPLEEXAMPLE ID | New Column ------ | ------ 1 | STRIN_F 2 | SOMEO_E 3 | ANOTH_S 4 | E

我在spark中有一个数据帧,类似这样:

ID     | Column
------ | ----
1      | STRINGOFLETTERS
2      | SOMEOTHERCHARACTERS
3      | ANOTHERSTRING
4      | EXAMPLEEXAMPLE
ID     | New Column
------ | ------
1      | STRIN_F
2      | SOMEO_E
3      | ANOTH_S
4      | EXAMP_E
df.withColumn('new_column', concat(df.Column.substr(1, 5),
                                   lit('_'),
                                   df.Column.substr(8, 1)))
我想做的是从列中提取前5个字符加上第8个字符,然后创建一个新列,如下所示:

ID     | Column
------ | ----
1      | STRINGOFLETTERS
2      | SOMEOTHERCHARACTERS
3      | ANOTHERSTRING
4      | EXAMPLEEXAMPLE
ID     | New Column
------ | ------
1      | STRIN_F
2      | SOMEO_E
3      | ANOTH_S
4      | EXAMP_E
df.withColumn('new_column', concat(df.Column.substr(1, 5),
                                   lit('_'),
                                   df.Column.substr(8, 1)))
我不能使用以下代码,因为列中的值不同,我不想在特定字符上拆分,而是在第6个字符上拆分:

import pyspark
split_col = pyspark.sql.functions.split(DF['column'], ' ')
newDF = DF.withColumn('new_column', split_col.getItem(0))

谢谢大家

使用类似以下内容:

ID     | Column
------ | ----
1      | STRINGOFLETTERS
2      | SOMEOTHERCHARACTERS
3      | ANOTHERSTRING
4      | EXAMPLEEXAMPLE
ID     | New Column
------ | ------
1      | STRIN_F
2      | SOMEO_E
3      | ANOTH_S
4      | EXAMP_E
df.withColumn('new_column', concat(df.Column.substr(1, 5),
                                   lit('_'),
                                   df.Column.substr(8, 1)))
这将使用函数和

这些功能将解决您的问题