Apache spark 检索RDD的所有x[i]-th元素_Apache Spark_Pyspark - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 检索RDD的所有x[i]-th元素_Apache Spark_Pyspark - Fatal编程技术网

Apache spark 检索RDD的所有x[i]-th元素

apache-spark pyspark

Apache spark 检索RDD的所有x[i]-th元素,apache-spark,pyspark,Apache Spark,Pyspark,有没有更好的方法来实现上述目标。我只想得到每个条目的第I个元素所以基本上你描述的是： entries = sc.textFile(...).map(lambda line: line.split("\t")).map(lambda row:(int(row[0]),row[1])) some_set = set() for entry in entries.collect(): some_set.add(entry[1]) 或广义的 set(entries.keys().distin

有没有更好的方法来实现上述目标。我只想得到每个条目的第I个元素

所以基本上你描述的是：

entries = sc.textFile(...).map(lambda line: line.split("\t")).map(lambda row:(int(row[0]),row[1]))
some_set = set()
for entry in entries.collect():
    some_set.add(entry[1])

或广义的

set(entries.keys().distinct().collect())

在收集值时，为什么要避免收集？运行

entires.map（x=>x[0]）.collect（）

你是对的。我把问题框错了。想知道是否可以将所有x[i]存储到集合/列表中，而无需将整个RDD加载到驱动程序中。它不必是第一个元素。因此，第二个解决方案是否比条目更好。collect（）？这是因为它只收集第x[i]-th个元素而不是整个RDD吗？它更好，因为它只收集

不同的元素。如果你想要本地设置
你不会做得更好。如果您只想在distinct处停止。
set(entries.map(operator.itemgetter(i)).distinct().collect())




[pyspark]相关文章推荐



                                                        
如何在pyspark中查看RDD中每个分区的内容？
pyspark 
kafka到pyspark结构化流，将json解析为数据帧
pysparkapache-kafka 
Pyspark 使用Python笔记本在Databricks中创建波光粼粼的水云
pyspark 
PySpark中稠密向量的元素相减
pyspark 
Pyspark GCP Dataproc spark消费BigQuery
pysparkgoogle-cloud-platformgoogle-bigquery 
在pyspark中以追加模式写入红移
pysparkamazon-redshift 
pyspark笛卡尔联接：重命名重复列
pyspark 
如何通过元数据字段（例如通过_id）过滤从Elasticsearch读取的PySpark SQL数据帧？
pyspark 
PySpark与Gensim实现分布式在线LDA的比较
pysparknlp 
Pyspark 从RDD统计不同的用户
pyspark 
Pyspark 在Pypark按小时分组？
pyspark 
如何在一个文件中读取多个嵌套的json对象，由pyspark提取到Azure DataRicks中的dataframe？
pyspark 
PySpark-通过When（）将字符串转换为时间戳
pyspark 
动态构建Pyspark sytax
pyspark 
使用pyspark将新列添加到数据帧的问题
pyspark 
Pyspark 向Jupyter笔记本添加Jar文件-：java.lang.ClassNotFoundException:com.teradata.jdbc.TeraDriver
pysparkjarjupyter-notebookteradata 
Pyspark Databricks Connect 6.6不'；t向Spark上下文添加自定义模块
pyspark 
如何使用Pyspark在Databricks中合并配置单元表中的记录？
pyspark 
Pyspark 根据其他列的值，向dataframe中的时间戳列添加小时数
pyspark 
将Pyspark数据帧中的推断模式持久化到S3上的文件
pyspark 
                                       





随机文章推荐



                                                        
Wolfram mathematica 如何在mathematica中绘制长度为n的三角形网格
wolfram-mathematica 
Wolfram mathematica Mathematica：帮助解决带有不等式约束的非线性方程组
wolfram-mathematica 
Wolfram mathematica 任意深度嵌套模式匹配
wolfram-mathematica 
Wolfram mathematica 自动生成具有折叠部分的笔记本
wolfram-mathematica 
Wolfram mathematica 如何隐藏$Aborted消息？
wolfram-mathematica 
Wolfram mathematica Mathematica:Thread:：tdlen:{Null}{}中长度不等的对象无法组合&燃气轮机&燃气轮机；
wolfram-mathematica 
Wolfram mathematica 如何在特殊曲面网格中对边进行排序
wolfram-mathematica 
Wolfram mathematica 数学中绝对值的简化
wolfram-mathematica 
Wolfram mathematica 为了在宏中使用，有没有技巧将两个Item[]表达式组合成一个表达式？
wolfram-mathematica 
Wolfram mathematica 为什么偏微分方程给出的边界和初始条件是不一致的？（1D加热pde）
wolfram-mathematica 
Wolfram mathematica Mathematica：具有相同坐标系的不同直方图&；相同的料仓宽度规格
wolfram-mathematica 
Wolfram mathematica 我想为样本数据的每个x值添加高斯噪声，然后绘制
wolfram-mathematicaplot 
Wolfram mathematica 使用Mathematica和ListPlot时，如何在列表中指定x轴？
wolfram-mathematica 
Wolfram mathematica Mathematica 8.0中的CUDA Mathlink未安装
wolfram-mathematica 
Wolfram mathematica 在mathematica中的某一点上计算函数表
wolfram-mathematica 
Wolfram mathematica 复杂表达式的简化
wolfram-mathematica 
Wolfram mathematica 在绘图周围放置框架时删除内部网格
wolfram-mathematica 
Wolfram mathematica 我将如何修改这些不等式，以便Mathematica确认它们？
wolfram-mathematica 
Wolfram mathematica 什么是<&燃气轮机；插值函数中的符号平均值？
wolfram-mathematica 
Wolfram mathematica mathematica：如何将字符串转换为数字（使用类似fortran的“E”）？
wolfram-mathematica


                                        

                                        
                                        


                                                
                                                        [apache spark]相关推荐
                                                        
                                                        
                                                

                                                
                                                        Tags
                                                        
Dynamic
Cordova
Google Colaboratory
Parse Platform
Url
Python 2.7
Xmpp
Pytorch
Editor
Entity Framework Core
Sublimetext3
Omnet++
Botframework
Gruntjs
Properties
Macros
C#
Couchdb
Oracle10g
Tree
Sphinx
Soap
Tkinter
Processing
Puppet
Sqlalchemy
Statistics
Octave
Numpy
Networking
Google Analytics
Server
Oracle Apex
Localization
Join
Anaconda
Google Compute Engine
Windows Installer
Compression
Web Scraping
Wicket
Latex
Jupyter Notebook
Input
Yaml
Maven 2
Browser
Solr
Database
Kibana
Mod Rewrite
Next.js
List
System Verilog
Pentaho
Ssas
Qt
Openssl
Actionscript 3
Hyperlink
Asp.net Mvc 4
Smtp
Sequelize.js
Sublimetext2
Embedded
Elixir
String
Ethereum
Ada
Stata
Drop Down Menu
Talend
Azure Service Fabric
Seo
D3.js
Airflow
Webgl
Cocos2d X
Computer Vision
.htaccess
Gremlin
Msbuild
Batch File
Math
Playframework 2.0
Ravendb
Google Drive Api
Azure Active Directory
Proxy
Android
Gmail
Webpack
Influxdb
Sdk
Time
Sonarqube
Boost
Encryption
Tinymce
Rss
Doctrine
Openid
Tabs
Sails.js
Install4j
Inno Setup
Coffeescript
Vba
Windows Phone
Hazelcast
Model View Controller
Sml
Search
Swing
Ruby On Rails 3
Computer Science
Asp.net Mvc 3
Vmware
Geometry
Sas
Indexing
Nunit
Outlook
Snmp
Mongodb
Reflection
Ssl
Lotus Notes
Typescript
Mobile
Plone
Playframework
Aws Lambda
Asp.net Mvc 2
Ssh
Zsh
Stripe Payments
Jsf
Amazon Ec2
Django Rest Framework
Akka
Select
Kendo Ui
Continuous Integration
Postgresql
Animation
Opencv
Spring Integration
Error Handling
Openlayers 3
Firefox
Clearcase
F#
Session
Kentico
Google Chrome Extension
Data Structures
Audio
Chef Infra
Windows 10
Docusignapi
Sed
Uwp
Clang
Discord.py
Windbg
Jvm
.net Core
Ipython
Oop
Ibm Mq
Ibm Mobilefirst
Xamarin.ios
Redux
Visual Studio 2010
Keycloak
Dynamics Crm 2011
Mariadb
Jekyll
Graph
Filter
Flutter
Internet Explorer 8
Facebook Graph Api
Database Design
Twitter
Dependencies
Prometheus
Rust
Apache Flink
Templates
Cobol
Kubernetes
Vector
Amazon Redshift
Google Maps
Xslt
Polymer
Operating System
Command Line
Orientdb


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网