Apache spark 如何在Spark中向分解结构添加列？_Apache Spark_Dataframe_Pyspark - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何在Spark中向分解结构添加列？_Apache Spark_Dataframe_Pyspark - Fatal编程技术网

Apache spark 如何在Spark中向分解结构添加列？

apache-spark dataframe pyspark

Apache spark 如何在Spark中向分解结构添加列？,apache-spark,dataframe,pyspark,Apache Spark,Dataframe,Pyspark,假设我有以下数据： {"id":1, "payload":[{"foo":1, "lol":2},{"foo":2, "lol":2}]} 我想分解有效载荷并在其中添加一列，如下所示： df = df.select('id', F.explode('payload').alias('data')) df = df.withColumn('data.bar', F.col('data.foo') * 2) 但是，这会导致数据框包含三列： id 数据 data.bar 我希望data.bar

假设我有以下数据：

{"id":1, "payload":[{"foo":1, "lol":2},{"foo":2, "lol":2}]}

我想分解有效载荷并在其中添加一列，如下所示：

df = df.select('id', F.explode('payload').alias('data'))
df = df.withColumn('data.bar', F.col('data.foo') * 2)

但是，这会导致数据框包含三列：

```
id
```
```
数据
```
```
data.bar
```

我希望

data.bar

成为

data

结构的一部分

如何将列添加到分解结构中，而不是添加顶级列

df = df.withColumn('data', f.struct(
    df['data']['foo'].alias('foo'),
   (df['data']['foo'] * 2).alias('bar')
))

这将导致：

root
 |-- id: long (nullable = true)
 |-- data: struct (nullable = false)
 |    |-- col1: long (nullable = true)
 |    |-- bar: long (nullable = true)

更新：

def func(x):
    tmp = x.asDict()
    tmp['foo'] = tmp.get('foo', 0) * 100
    res = zip(*tmp.items())
    return Row(*res[0])(*res[1])

df = df.withColumn('data', f.UserDefinedFunction(func, StructType(
    [StructField('foo', StringType()), StructField('lol', StringType())]))(df['data']))

附言

Spark几乎不支持就地操作

因此，每次您想在原地执行时，实际上需要执行替换。
您必须重建模式，使用
选择，或使用自定义项来修改数据-这里介绍了几乎所有这些选项：可能的重复方向肯定是正确的！有没有一种方法可以在不知道data 内容的情况下执行此操作（当然data.foo 除外）？我编辑了我的问题，添加了一个额外的data.lol 列，以使问题更清楚。

[dataframe]相关文章推荐 Dataframe 如何删除数据帧中在特定列中具有NA的所有行？ dataframejulia Dataframe 如何在Julia的Jupyter中查看整个数据帧 dataframejulia Dataframe Writetable正在使用“导出数据”；可为空的{Type}（数据）"；而不仅仅是Julia中的数据 dataframejulia Dataframe 从pyspark数据帧筛选负值 dataframefilterpyspark Dataframe 了解PySpark数据帧中列是否具有常量值的最快方法 dataframepyspark Dataframe 如何根据月份的周数将日期截断为星期五？我有一个包含年份、月份和星期的下面数据文件，我需要创建一个列日期，如下面的年份、月份和星期列，并考虑周末结束，星期五。 Year Month Weeks date 2018 April 01 W 2018-04-06 2018 April 02 W 2018-04-13 2018 April 03 W 2018-04-20 2018 April 04 W 2018-04-27 dataframeapache-sparkdatetimepyspark Dataframe 使用spark将Hbase表转储到CSV会导致数据丢失 dataframeapache-sparkhbase Dataframe 我想在pyspark数据框中按日期查找MapType中的单词频率？ dataframeapache-sparkdictionarypyspark Dataframe 获取具有StringType的所有列的名称 dataframeapache-sparkpyspark Dataframe 创建具有长ColumnName的Julia数据帧 dataframejulia Dataframe 从数据帧中的字符串提取整数 dataframejulia Dataframe 如何对照另一个数据帧检查pyspark数据帧值 dataframepyspark 随机文章推荐 Azure functions 这三个错误弹出窗口的原因是什么？我应该担心吗？ azure-functions Azure functions Azure功能：应用程序冻结-“；“不允许加载表示装配”；在错误消息中 azure-functions Azure functions req.Content.ReadAsStringAsync（）。从浏览器（但不在Azure门户中）访问时，结果始终返回null azure-functions Azure functions Azure功能在Core 3.0更新后停止工作 azure-functions Azure functions Azure函数：内存不足，无法继续执行程序 azure-functions Azure functions 这个NContab表达式正确吗？ azure-functions Azure functions 集成测试Azure TimerTrigger函数 azure-functions Azure functions Azure函数超时 azure-functions Azure functions ILogger的所有条目都到哪里去了？ azure-functions Azure functions Azure函数OpenID连接（预览）设置内部服务器错误 azure-functionsopenid Azure functions 如何使用函数app复制blob数据并将其存储在子目录中 azure-functions

[apache spark]相关推荐 Tags Pascal Prometheus Post Download Kdb Collections Rx Java Resharper Cypress Spring Automation Android Amazon Web Services Jvm Amazon Dynamodb Visual Studio Bootstrap 4 Visual Studio Code Sas Azure Sql Database Electron Scala Calendar Discord Generics Asp.net Core Racket .net 4.0 Asp.net Core Mvc Sml Processing Couchdb Gdb Clearcase Unix Qt Cloud Amp Html Vim Corda Java Me Isabelle Object Google Plus 3d Plsql Parameters Qt4 Phantomjs Spring Batch Xmpp Express Synchronization Variables Forms Graphviz Silverlight 4.0 Apache Kubernetes Web Acumatica .net Dotnetnuke Animation Telegram Dependency Injection Asp.net Mvc 2 Javafx Combobox Ibm Midrange Office365 Ionic Framework Windows Phone Plone Mpi Uwp Pytorch Servlets Maps Web Applications Api D Fullcalendar Pyspark Facebook Amazon Redshift Cobol Ms Office Dynamics Crm Cakephp Spring Mvc Snmp Asp Classic Nginx Xcode Visual Studio 2017 Stored Procedures E Commerce Vb.net Memory Leaks Vector Orientdb Active Directory Coffeescript Struts2 Extjs4 Unit Testing Firefox Addon Exception Css Gtk Stata Three.js Vaadin Swift3 Android Emulator Tridion Triggers Sugarcrm C# 4.0 C++11 Autodesk Forge Ssrs 2008 Geometry Silverstripe Excel Yocto Https Ag Grid Arangodb Ios Mvvm Hybris Merge Ignite Sql Server 2005 Plugins Asynchronous Woocommerce Spring Integration Opengl Es Visual Studio 2010 Frameworks Ldap Dll Ruby On Rails 3.1 Cygwin Telerik Exception Handling Next.js Latex Phpunit Ruby On Rails 3.2 Jersey Nuget Cocos2d Iphone Identityserver4 Tableau Api Rss Mono Itext Stream Logic Sphinx Algorithm Encoding Project Management Layout Elixir View Filter Aem Omnet++ .htaccess Certificate Zsh Timer EmptyTag Grep Ecmascript 6 Instagram Lucene Xslt Playframework 2.0 Session Swiftui Meteor Login File Upload Tsql Ant Apache Kafka System Verilog Interface Vagrant Spring Security Bash Cmake Netbeans Datetime Verilog

Copyright © 2024. All Rights Reserved by - Fatal编程技术网