Scala 如何从每行的列中提取特定元素？_Scala_Apache Spark_Spark Dataframe - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何从每行的列中提取特定元素？_Scala_Apache Spark_Spark Dataframe - Fatal编程技术网

Scala 如何从每行的列中提取特定元素？

scala apache-spark

Scala 如何从每行的列中提取特定元素？,scala,apache-spark,spark-dataframe,Scala,Apache Spark,Spark Dataframe,Spark 2.2.0和Scala 2.11.8中有以下数据帧 +----------+-------------------------------+ |item | other_items | +----------+-------------------------------+ | 111 |[[444,1.0],[333,0.5],[666,0.4]]| | 222 |[[444,1.0],[333,0.5]]

Spark 2.2.0和Scala 2.11.8中有以下数据帧

+----------+-------------------------------+
|item      |        other_items            |
+----------+-------------------------------+
|  111     |[[444,1.0],[333,0.5],[666,0.4]]|
|  222     |[[444,1.0],[333,0.5]]          |
|  333     |[]                             |
|  444     |[[111,2.0],[555,0.5],[777,0.2]]|

我想获得以下数据帧：

+----------+-------------+
|item      | other_items |
+----------+-------------+
|  111     | 444         |
|  222     | 444         |
|  444     | 111         |

因此，基本上，我需要从每行的

其他\u项中提取第一个项。另外，我需要忽略那些在其他产品中有空数组的行[]

我怎么做
我尝试过这种方法，但它没有给我一个预期的结果
result = df.withColumn("other_items",$"other_items"(0))

printScheme
提供以下输出：
 |-- item: string (nullable = true)
 |-- other_items: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _1: string (nullable = true)
 |    |    |-- _2: double (nullable = true)

像这样：
val df = Seq(
  ("111", Seq(("111", 1.0), ("333", 0.5), ("666", 0.4))), ("333", Seq())
).toDF("item", "other_items")


df.select($"item", $"other_items"(0)("_1").alias("other_items"))
  .na.drop(Seq("other_items")).show

当第一个apply
（$“other_items”（0）
）选择数组的第一个元素时，第二个apply
（'u 1”）
）选择\u 1
字段，并且na.drop
删除空数组引入的空值
+----+-----------+
|item|other_items|
+----+-----------+
| 111|        111|
+----+-----------+




[apache spark]相关文章推荐



                                                        
                                       





随机文章推荐



                                                        
Spring cloud RabbitMQ connectionFactory-amqp URI中的方案错误：
spring-cloud 
Spring cloud 为什么ZUUL强制信号量隔离来执行其Hystrix命令？
spring-cloud 
Spring cloud Spring cloud netflix-Eureka-JMXMonitor警告
spring-cloud 
Spring cloud zuul多个URL一条路径
spring-cloud 
Spring cloud 重新启动应用程序不会从git获取更新的属性
spring-cloud


                                        

                                        
                                        


                                                
                                                        [scala]相关推荐
                                                        
Scala 具有抽象类型成员的具体类
									Scala
							 									Types
							 
Scala 集合的视图是什么？您希望何时使用它们？
									Scala
							 
Scala中的代码重用等数组语言
									Scala
							 									Programming Languages
							 
Scala中分组实用程序函数的首选方法？
									Scala
							 
“理解”；类型参数不符合类型参数界限；Scala中的错误
									Scala
							 
迭代器上的Scala映射不会产生副作用
									Scala
							 									Map
							 
Scala 向Play 2框架添加多个子项目
									Scala
							 									Playframework 2.0
							 
Scala 构造函数中用于设置字段的中缀运算符
									Scala
							 
scala类构造函数参数
									Scala
							 
仅一根立柱的光滑Scala投影
									Scala
							 
Scala 如何使用sbt 0.7.7使用https运行jetty
									Scala
							 									Https
							 									Jetty
							 									Sbt
							 
Scala工作表在Intellij中不工作
									Scala
							 									Intellij Idea
							 
Scala 如何获取对Akka中现有ActorSystem的引用？
									Scala
							 									Akka
							 
Scala Playframework发送文件
									Scala
							 									Playframework
							 
Scala 任务不可序列化Flink
									Scala
							 									Apache Flink
							 
Scala 阿帕切弗林克的数据集联合
									Scala
							 									Apache Flink
							 
Scala 如何创建自定义列表累加器，即列表[（Int，Int）]？
									Scala
							 									Apache Spark
							 
Scala：编译过程中使用$生成的文件的重要性
									Scala
							 
Scalatest检查映射是否包含列表中的值
									Scala
							 
Scala 将对象序列化/反序列化到分隔字符串或从分隔字符串中序列化/反序列化对象
									Scala
							 									Csv
							 									Serialization
							 
Scala 返回参与者的实际结果，而不是承诺/未来
									Scala
							 									Akka
							 
Scala Spark：我可以在将字节保存到序列文件之前压缩字节吗？
									Scala
							 									Apache Spark
							 
“如何”；这"；关键字正在Scala辅助构造函数中使用和工作？
									Scala
							 
Scala宏：生成的类无法使用未实现的成员进行编译
									Scala
							 									Macros
							 
Scala：如何使用多个delimeter拆分单词
									Scala
							 
在Scala中相交和合并/连接两个贴图
									Scala
							 
Scala 从左到右的参数类型推断
									Scala
							 
Scala未来会返回什么？
									Scala
							 
Scala 如何使用凿子模块作为软件包
									Scala
							 									Sbt
							 
Scala 如何为给定的JSON字符串创建spark sql模式？
									Scala
							 									Apache Spark
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
If Statement
Routing
Instagram
Python 3.x
Wordpress
Pytorch
Mono
Twitter Bootstrap
Mongoose
Android Layout
Spring Boot
Dictionary
Post
Bazel
Gtk
Model View Controller
Qt
Methods
Tableau Api
Magento
Udp
Oracle Apex
Eclipse Rcp
Coding Style
Jboss
Memory Leaks
Cassandra
Sms
Windows
Amazon Cloudformation
Loops
Apache Pig
Https
Moodle
Jenkins
Button
Rss
Sublimetext2
Apache Flex
Indexing
Devexpress
Forms
Ide
Internet Explorer 8
Google Chrome Extension
Ecmascript 6
Login
Exchange Server
Serialization
Object
Couchdb
Angular Material
Redirect
Centos
Coldfusion
Tsql
Embedded
Artifactory
Process
Grafana
Eclipse
Web Crawler
Web Scraping
Deployment
Google Maps Api 3
Drools
Sqlite
Postgresql
Material Ui
Matrix
Dll
Sharepoint 2007
Windows 10
Objective C
Webstorm
Import
Windows Mobile
Apache Flink
Spring Cloud
Ajax
Webview
Mapping
Cors
Verilog
Sed
Google App Maker
Cypress
Excel
Google Bigquery
Visual Studio 2010
Prometheus
Lotus Notes
C++ Cli
Google Cloud Firestore
Apache2
Django Rest Framework
Laravel 5
Selenium
Gmail
Audio
Reporting Services
Yii2
Apache Camel
Ssh
Hyperledger Fabric
Angular6
Openlayers
Wicket
Mariadb
Linux Kernel
Plugins
Matlab
Filesystems
Project Management
Xpath
Sphinx
Scala
Cucumber
Mod Rewrite
Xquery
Prolog
Download
Delphi
Stata
Haskell
Drupal 6
Parse Platform
Umbraco
Unit Testing
Asp.net Core Mvc
.net Core
Jms
Dialogflow Es
Azure Sql Database
Leaflet
Acumatica
Puppet
Menu
Sql Server 2005
Xcode4
Db2
Rally
Exception Handling
Windows Installer
Navigation
Mobile
Utf 8
Animation
Asp.net Web Api
Julia
Opengl
Omnet++
Streaming
Usb
Openshift
Windows Services
Azure Devops
Cakephp
Snowflake Cloud Data Platform
Spring Batch
Dart
Gis
Actionscript 3
Core Data
Phantomjs
Json
Doctrine Orm
Merge
Outlook
Ant
Frameworks
Excel Formula
Typo3
Vmware
Mapreduce
Vector
Jasper Reports
Lua
Assembly
.htaccess
Log4j
Discord.py
Vue.js
Windows 8
Loopbackjs
Struct
Internationalization
Entity Framework Core
Grid
Jquery Mobile
Flash
Tfs
Function
Ssis
Git
Ocaml
EmptyTag
Dynamic
Ibm Mobilefirst
Jhipster
Hyperlink


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网