Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Dataframe apachespark数据帧的随机分裂问题_Dataframe_Apache Spark - Fatal编程技术网

Dataframe apachespark数据帧的随机分裂问题

Dataframe apachespark数据帧的随机分裂问题,dataframe,apache-spark,Dataframe,Apache Spark,我试图划分数据帧训练和测试部分。我用了randomsplit函数。但形式改变了。 数据帧: +-----------------------------------------+-----+ |features |label| +-----------------------------------------+-----+ |[2.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0] |99.4 | |[8.

我试图划分数据帧训练和测试部分。我用了randomsplit函数。但形式改变了。 数据帧:

+-----------------------------------------+-----+
|features                                 |label|
+-----------------------------------------+-----+
|[2.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0]   |99.4 |
|[8.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0]   |83.62|
|[9.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0]   |98.21|
|[14.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0]  |16.7 |
|[15.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0]  |97.91|
|[16.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0]  |92.71|
|[17.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0]  |28.91|
|[18.0,92.0,1.0,2002.0,9.0,37.0,2.0,2.0]  |97.19|
|[20.0,39.0,16.0,2002.0,9.0,37.0,34.0,2.0]|99.25|
|[21.0,39.0,16.0,2002.0,9.0,37.0,34.0,2.0]|96.09|
|[22.0,39.0,16.0,2002.0,9.0,37.0,34.0,2.0]|98.0 |
|[23.0,36.0,4.0,2002.0,9.0,37.0,25.0,2.0] |96.74|
|[24.0,36.0,4.0,2002.0,9.0,37.0,25.0,2.0] |90.21|
|[25.0,36.0,4.0,2002.0,9.0,37.0,25.0,2.0] |100.0|
|[26.0,36.0,4.0,2002.0,9.0,37.0,25.0,2.0] |55.93|
|[27.0,36.0,4.0,2002.0,9.0,37.0,25.0,2.0] |82.15|
|[28.0,36.0,4.0,2002.0,9.0,37.0,25.0,2.0] |94.2 |
|[29.0,36.0,4.0,2002.0,9.0,37.0,25.0,2.0] |17.3 |
|[30.0,36.0,4.0,2002.0,9.0,37.0,25.0,2.0] |77.19|
|[31.0,81.0,7.0,2002.0,9.0,37.0,25.0,5.0] |87.06|
+-----------------------------------------+-----+
格式已更改。为什么?我应该如何更正格式?我在网上搜索,每个人都这样做。但他们没有给出错误

splits = df.randomSplit([0.7, 0.3])
training = splits[0]
test = splits[1]
training.show(10, False)
test.show(10, False)
+------------------------------------+-----+
|features                            |label|
+------------------------------------+-----+
|(8,[0,3,4,5],[2.0,2004.0,3.0,11.0]) |0.0  |
|(8,[0,3,4,5],[8.0,2004.0,3.0,11.0]) |90.32|
|(8,[0,3,4,5],[14.0,2004.0,3.0,11.0])|74.81|
|(8,[0,3,4,5],[15.0,2004.0,3.0,11.0])|94.13|
|(8,[0,3,4,5],[16.0,2004.0,3.0,11.0])|87.5 |
|(8,[0,3,4,5],[17.0,2004.0,3.0,11.0])|86.2 |
|(8,[0,3,4,5],[18.0,2004.0,3.0,11.0])|59.4 |
|(8,[0,3,4,5],[19.0,2004.0,3.0,11.0])|94.07|
|(8,[0,3,4,5],[20.0,2004.0,3.0,11.0])|94.06|
|(8,[0,3,4,5],[21.0,2004.0,3.0,11.0])|79.27|
+------------------------------------+-----+
+------------------------------------+-----+
|features                            |label|
+------------------------------------+-----+
|(8,[0,3,4,5],[9.0,2004.0,3.0,11.0]) |80.11|
|(8,[0,3,4,5],[22.0,2004.0,3.0,11.0])|95.59|
|(8,[0,3,4,5],[28.0,2004.0,3.0,11.0])|72.76|
|(8,[0,3,4,5],[30.0,2004.0,3.0,11.0])|92.19|
|(8,[0,3,4,5],[33.0,2004.0,3.0,11.0])|94.3 |
|(8,[0,3,4,5],[36.0,2004.0,3.0,11.0])|77.44|
|(8,[0,3,4,5],[45.0,2004.0,3.0,11.0])|95.13|
|(8,[0,3,4,5],[54.0,2004.0,3.0,11.0])|98.76|
|(8,[0,3,4,5],[57.0,2004.0,3.0,11.0])|95.98|
|(8,[0,3,4,5],[65.0,2004.0,3.0,11.0])|90.22|
+------------------------------------+-----+