Python 在数据帧上应用映射函数_Python_Apache Spark_Pyspark - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在数据帧上应用映射函数_Python_Apache Spark_Pyspark - Fatal编程技术网

Python 在数据帧上应用映射函数

python apache-spark pyspark

Python 在数据帧上应用映射函数,python,apache-spark,pyspark,Python,Apache Spark,Pyspark,我刚刚开始使用databricks/pyspark。我正在使用python/spark 2.1。我已将数据上载到表中。此表是一列字符串。我希望对列中的每个元素应用一个映射函数。我将表加载到数据帧中： df = spark.table("mynewtable") 我能看到的唯一方法是其他人说，将其转换为RDD以应用映射函数，然后返回到dataframe以显示数据。但这会导致作业中止阶段失败： df2 = df.select("_c0").rdd.flatMap(lambda x: x.appen

我刚刚开始使用databricks/pyspark。我正在使用python/spark 2.1。我已将数据上载到表中。此表是一列字符串。我希望对列中的每个元素应用一个映射函数。我将表加载到数据帧中：

df = spark.table("mynewtable")

我能看到的唯一方法是其他人说，将其转换为RDD以应用映射函数，然后返回到dataframe以显示数据。但这会导致作业中止阶段失败：

df2 = df.select("_c0").rdd.flatMap(lambda x: x.append("anything")).toDF()

我只想对表中的数据应用任何类型的映射函数。例如，向列中的每个字符串添加一些内容，或者对字符执行拆分，然后将其放回数据帧中，这样我就可以.show（）或显示它

您不能：

使用
```
flatMap
```
，因为它将展平
```
行
```
无法使用
```
追加
```
，因为：
- ```
元组
```
  或
```
行
```
  没有附加方法
- 针对副作用执行
```
append
```
  （如果集合中存在），并返回
```
None
```

我会在列中使用

：
df.withColumn("foo", lit("anything"))

但是map
也应该起作用：
df.select("_c0").rdd.flatMap(lambda x: x + ("anything", )).toDF()

编辑（给出评论）：
您可能需要一个udf

from pyspark.sql.functions import udf

def iplookup(s):
    return ... # Some lookup logic

iplookup_udf = udf(iplookup)

df.withColumn("foo", iplookup_udf("c0"))

默认返回类型是StringType
，因此如果您需要其他内容，您应该调整它。
我有一个后续问题@Alper t。Turker在pyspark udf或rdd处理中哪个性能最好？

[apache spark]相关文章推荐

随机文章推荐

Azure functions 有效负载大小限制-逻辑应用程序到功能 azure-functions

Azure functions 容错并记录Azure Blob存储中不兼容的行 azure-functions azure-data-factory

Azure functions 如何通知web客户端Azure存储或servicebus队列事件 azure-functions

Azure functions iFunctionException筛选并写入响应 azure-functions

Azure functions 使用多个Azure函数QueueTriggers侦听同一存储队列 azure-functions

Azure functions 与Blazor服务器和Azure功能共享信号器服务 azure-functions

[python]相关推荐

在Python或Perl中，什么是好的开源pastebin？
Python Perl Open Source

Python 在Django表单字段之间显示一些自由文本
Python Django

Python 如何停止导入错误：无法导入设置'；mofin.settings'；将django与wsgi一起使用时？
Python Django Apache

Python 创建具有列表理解功能的词典
Python Dictionary

如何在python中获取部分正则表达式匹配作为变量？
Python Regex Perl

铸造原始字符串python
Python String

&引用；ImportError:无法导入名称SkipTest“；在python中导入numpy时
Python Numpy

Python 数据帧中的numpy datetime64
Python Datetime Pandas

Python 如何将两个数据帧与条件数据帧组合？
Python Pandas Dataframe Merge Dask

三维散点图中用直线连接相邻点的Python
Python Matplotlib Plot 3d

Python 没有名为'；sklearn'；
Python Scikit Learn Pip

Python 烧瓶中的蓝图
Python Python 3.x Flask

Python BeautifulSoup如何用于循环和提取特定数据？
Python

Python 无法更改matplotlib中的默认颜色映射
Python Matplotlib Jupyter Notebook

Python中的索引运算符和列表
Python

Python 如何通过仅指定要添加的列来添加两个数据帧
Python Pandas

Python PyCharm pip安装Google搜索给出错误
Python Web Scraping

我们可以从'；呼叫其他沃森服务（如助理、发现等）吗；ibm云计算功能&x27；使用python（ibm_watson sdk）？
Python

python:用于扩展多个矩阵的numpy函数
Python Numpy Matrix

Python 使用NumPy对规范化数据段进行平均签名的最快方法？
Python Performance Numpy

Python Can'；t获得用于求解密码的爬山算法的实现
Python Algorithm

在python中删除标题
Python

Python 使用nn.Identity进行剩余学习背后的想法是什么？
Python Neural Network Pytorch

Python：匹配字符串并就地替换
Python Pandas Dataframe Replace

Python 如何在模型中正确使用tensorflow函数
Python Tensorflow Keras Model

Python 如何在顶部固定pygame窗口？
Python Python 3.x Windows

包装python超类的所有方法
Python Class Inheritance

Python 从B4S筛选列表中提取链接和广告标题
Python Web Web Scraping

Python pyspark中的大型数据帧生成
Python Dataframe Pyspark

Python 新列作为其他列的列表，但不带NAN
Python Pandas List Numpy

Tags

Eclipse Unix Excel Formula Networking Discord.py Redux Cobol Visual Studio 2008 Windows 10 Codenameone Apache Wso2 C# Programming Languages Couchbase Apache Flink Wordpress Merge .net Core Combobox Extjs4 Doxygen Post Apache Flex Twilio Heroku Robotframework Perforce Firefox Cygwin Rally Requirejs Sugarcrm Colors Rabbitmq Google App Maker Indexing Binary Common Lisp Encoding Cocoa Artificial Intelligence Sprite Kit Oracle Apex Jasmine Sql Server Reference Less Sublimetext2 Api Ionic2 Centos Iis Safari Gstreamer Cloud Objective C Stanford Nlp Titanium Sql Coding Style Rest Enums Sails.js Botframework Matrix Ubuntu Twig Talend Bots Swift3 Arrays Graphviz Gtk Ipython Sitecore Log4net Cron Mule Google App Engine Jestjs Ruby On Rails 3.1 Google Visualization Deep Learning Batch File Spring Integration Session Apache Camel Marklogic Sas Blockchain Elixir Triggers Sonarqube Jsf Artifactory Pascal Floating Point Virtualbox Azure Functions Tensorflow Mongoose Ssh Lotus Notes Asp.net Mvc 2 Hive Shell Dynamics Crm Nestjs C# 3.0 Sorting Drupal 7 Spring Boot Migration Ibm Mobilefirst Loopbackjs Ms Office Air Xquery Qt4 Windows Phone Google Maps Oauth Scroll Mongodb Composer Php Modelica Tridion Forms Amazon Redshift Exception Maps Ember.js Build Z3 Select Jqgrid Uiview Dojo Apache Zookeeper Cookies Prometheus Google Cloud Firestore Anaconda Memory Leaks Primefaces Ip Playframework X86 Https Cors Swagger Subsonic Windows Services Yocto Regex Smalltalk Angular Material Linq To Sql Layout Youtube Api Resharper Geometry Windows 7 Teradata Mapreduce Function Django Struts2 Llvm Ldap Process Debian Cocoa Touch Azure Devexpress Windows Sql Server 2008 R2 Delphi Protractor Jquery Plugins Lisp Glassfish Knockout.js Mqtt Ant Graphics Android Emulator Vagrant Angularjs Windows 8 Python 2.7 Sublimetext3 Sphinx Google Chrome Inno Setup E Commerce View D Stata Jakarta Ee

Copyright © 2024. All Rights Reserved by - Fatal编程技术网