Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在PySpark中拆分数据帧列_Python_Apache Spark_Dataframe_Split_Pyspark - Fatal编程技术网

Python 如何在PySpark中拆分数据帧列

Python 如何在PySpark中拆分数据帧列,python,apache-spark,dataframe,split,pyspark,Python,Apache Spark,Dataframe,Split,Pyspark,我有一个包含亚马逊书评的.csv数据集。我看起来像这样: B000F83SZQ|[0, 0]|5.0|I enjoy vintage books and movies so I enjoyed reading this book. The plot was unusual. Don't think killing someone in self-defense but leaving the scene and the body without notifying the police or

我有一个包含亚马逊书评的.csv数据集。我看起来像这样:

B000F83SZQ|[0, 0]|5.0|I enjoy vintage books and movies so I enjoyed reading this book.  The plot was unusual.  Don't think killing someone in self-defense but leaving the scene and the body without notifying the police or hitting someone in the jaw to knock them out would wash today.Still it was a good read for me.|05 5, 2014|A1F6404F1VG29J|Avidreader|Nice vintage story|1399248000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
B000F83SZQ|[2, 2]|4.0|This book is a reissue of an old one   the author was born in 1910. It's of the era of, say, Nero Wolfe. The introduction was quite interesting, explaining who the author was and why he's been forgotten     I'd never heard of him.The language is a little dated at times, like calling a gun a &#34  heater.&#34   I also made good use of my Fire's dictionary to look up words like &#34   deshabille&#34   and &#34   Canarsie.&#34    Still, it was well worth a look-see.|01 6, 2014|AN0N05A9LIJEQ|critters|Different...|1388966400                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
B000F83SZQ|[2, 2]|4.0|This was a fairly interesting read.  It had old- style terminology.I was glad to get  to read a story that doesn't have coarse, crasslanguage.  I read for fun and relaxation......I like the free ebooksbecause I can check out a writer and decide if they are intriguing,innovative, and have enough of the command of Englishthat they can convey the story without crude language.|04 4, 2014|A795DMNCJILA6|dot|Oldie|1396569600                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
B000F83SZQ|[1, 1]|5.0|I'd never read any of the Amy Brewster mysteries until this one..  So I am really hooked on them now.|02 19, 2014|A1FV0SX13TWVXQ|Elaine H. Turley "Montana Songbird"|I really liked it.|1392768000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
B000F83SZQ|[0, 1]|4.0|If you like period pieces - clothing, lingo, you will enjoy this mystery.  Author had me guessing at least 2/3 of the way through.|03 19, 2014|A3SPTOKDG7WBLN|Father Dowling Fan|Period Mystery|1395187200                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
B000F83SZQ|[0, 0]|4.0|A beautiful in-depth character description makes it like a fast pacing movie. It is a pity Mr Merwin did not write 30 instead only 3 of the Amy Brewster mysteries.|05 26, 2014|A1RK2OCZDSGC6R|ubavka seirovska|Review|1401062400                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
B000F83SZQ|[0, 0]|4.0|I enjoyed this one tho I'm not sure why it's called An Amy Brewster Mystery as she's not in it very much. It was clean, well written and the characters well drawn.|06 10, 2014|A2HSAKHC3IBRE6|Wolfmist|Nice old fashioned story|1402358400                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
B000F83SZQ|[1, 1]|4.0|Never heard of Amy Brewster. But I don't need to like Amy Brewster to like this book. Actually, Amy Brewster is a side kick in this story, who added mystery to the story not the one resolved it. The story brings back the old times, simple life, simple people and straight relationships.|03 22, 2014|A3DE6XGZ2EPADS|WPY|Enjoyable reading and reminding the old times|1395446400                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
B000FA64PA|[0, 0]|5.0|Darth Maul working under cloak of darkness committing sabotage now that is a story worth reading many times over.  Great story.|10 11, 2013|A1UG4Q4D3OAH3A|dsa|Darth Maul|1381449600                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
B000FA64PA|[0, 0]|4.0|This is a short story focused on Darth Maul's role in helping the Trade Federation gain a mining colony. It's not bad, but it's also nothing exceptional. It's fairly short so we don't really get to see any characters develop. The few events that do happen seem to go by quickly, including what should have been major battles. The story is included in the novelShadow Hunter (Star Wars: Darth Maul), which is worth reading, so don't bother to buy this one separately.|02 13, 2011|AQZH7YTWQPOBE|Enjolras|Not bad, not exceptional|1297555200                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
B000FA64PA|[0, 0]|5.0|I think I have this one in both book and audio. It is a good story either way. good ol' Maul.|01 27, 2014|A1ZT7WV0ZUA0OJ|Mike|Audio and book|1390780800                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
B000FA64PA|[0, 0]|4.0|Title has nothing to do with the story.  I did enjoy it though.  Good short story about Darth Maul setting up two corporations against each other.  All in the end to help Darth Sidious' rise to power.  Won't take you long to read & it's cheap.  Go for it.|09 17, 2011|A2ZFR72PT054YS|monkeyluis|Darth Maul...the brother I never had.|1316217600                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
第二列中的第一个数字显示了认为该评论有用的人数。第二个数字第二列显示了对审查进行投票的总人数。现在我想生成两个新列。每个都包含一个数字。我试过几种方法,但似乎不起作用。这是我的密码:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
import json
import pandas as pd
from functools import reduce
import pyspark.sql.functions as F
#from pyspark.sql.functions import lit
from pyspark.sql.functions import col, split


conf = SparkConf().setAppName("Open json").setMaster("local[*]")
sc = SparkContext(conf = conf)

sqlContext = SQLContext(sc)

df = sqlContext.read.csv('Kindle.csv', sep='|', header=None)

oldColumns = df.schema.names
newColumns = ["asin", "helpful", "overall", "Reviewtext", 
              "reviewTime", "reviewerID", "reviewerName", "summary", 
              "unixReviewTime"]

df = reduce(lambda df, idx: df.withColumnRenamed(oldColumns[idx], newColumns[idx]), range(len(oldColumns)), df)


df.select(df.columns[1]).show(n=10)
df_help = df.select(df.columns[1])
print(df_help)
df.show(n=10)
test = df_help.withColumn("helpful", split(col("helpful"), ",").cast("array<int>"))
test.show(n=10)

test2 = df.select("helpful", F.regexp_replace(F.col("helpful"), "[\$#]", "").alias("replaced"))

test2.show()
test2.select(col("helpful"), split(col("helpful"), ",\s*").alias("help")).show()
然后我得到这个:


由于某些原因,我无法直接分割数据。我也无法删除第二列的括号。当我尝试时,我只是得到了更多的括号。

您几乎接近了,您所需要的只是适当的正则表达式,并使用在正则表达式替换和拆分内置函数之后已经完成的转换。因此,正确且有效的解决方案如下

df = spark.read.csv('Kindle.csv', sep='|', header=None)

oldColumns = df.schema.names
newColumns = ["asin", "helpful", "overall", "Reviewtext",
              "reviewTime", "reviewerID", "reviewerName", "summary",
              "unixReviewTime"]

df = reduce(lambda df, idx: df.withColumnRenamed(oldColumns[idx], newColumns[idx]), range(len(oldColumns)), df)

from pyspark.sql import functions as F
df = df.withColumn("helpful", F.split(F.regexp_replace(F.col("helpful"), "[\[\] ]", ""), ",").cast('array<int>'))\
    .withColumn('useful_review', F.col('helpful')[0])\
    .withColumn('voted_review', F.col('helpful')[1])\
    .drop('helpful') 

我希望答案是有帮助的

也许这可能是一个解决方案@ThenInking,很高兴听到这个消息,如果你接受并投票,那么它将帮助我谢谢
+----------+-------+--------------------+-----------+--------------+--------------------+--------------------+--------------------+-------------+------------+
|      asin|overall|          Reviewtext| reviewTime|    reviewerID|        reviewerName|             summary|      unixReviewTime|useful_review|voted_review|
+----------+-------+--------------------+-----------+--------------+--------------------+--------------------+--------------------+-------------+------------+
|B000F83SZQ|    5.0|I enjoy vintage b...| 05 5, 2014|A1F6404F1VG29J|          Avidreader|  Nice vintage story|1399248000       ...|            0|           0|
|B000F83SZQ|    4.0|This book is a re...| 01 6, 2014| AN0N05A9LIJEQ|            critters|        Different...|1388966400       ...|            2|           2|
|B000F83SZQ|    4.0|This was a fairly...| 04 4, 2014| A795DMNCJILA6|                 dot|               Oldie|1396569600       ...|            2|           2|
|B000F83SZQ|    5.0|I'd never read an...|02 19, 2014|A1FV0SX13TWVXQ|Elaine H. Turley ...|  I really liked it.|1392768000       ...|            1|           1|
|B000F83SZQ|    4.0|If you like perio...|03 19, 2014|A3SPTOKDG7WBLN|  Father Dowling Fan|      Period Mystery|1395187200       ...|            0|           1|
|B000F83SZQ|    4.0|A beautiful in-de...|05 26, 2014|A1RK2OCZDSGC6R|    ubavka seirovska|              Review|1401062400       ...|            0|           0|
|B000F83SZQ|    4.0|I enjoyed this on...|06 10, 2014|A2HSAKHC3IBRE6|            Wolfmist|Nice old fashione...|1402358400       ...|            0|           0|
|B000F83SZQ|    4.0|Never heard of Am...|03 22, 2014|A3DE6XGZ2EPADS|                 WPY|Enjoyable reading...|1395446400       ...|            1|           1|
|B000FA64PA|    5.0|Darth Maul workin...|10 11, 2013|A1UG4Q4D3OAH3A|                 dsa|          Darth Maul|1381449600       ...|            0|           0|
|B000FA64PA|    4.0|This is a short s...|02 13, 2011| AQZH7YTWQPOBE|            Enjolras|Not bad, not exce...|1297555200       ...|            0|           0|
|B000FA64PA|    5.0|I think I have th...|01 27, 2014|A1ZT7WV0ZUA0OJ|                Mike|      Audio and book|1390780800       ...|            0|           0|
|B000FA64PA|    4.0|Title has nothing...|09 17, 2011|A2ZFR72PT054YS|          monkeyluis|Darth Maul...the ...|          1316217600|            0|           0|
+----------+-------+--------------------+-----------+--------------+--------------------+--------------------+--------------------+-------------+------------+