Apache spark 将未分区配置单元表的子文件夹中的数据获取到spark中的数据框中_Apache Spark_Amazon S3_Hive - Fatal编程技术网

Apache spark 将未分区配置单元表的子文件夹中的数据获取到spark中的数据框中

apache-spark amazon-s3 hive

Apache spark 将未分区配置单元表的子文件夹中的数据获取到spark中的数据框中,apache-spark,amazon-s3,hive,Apache Spark,Amazon S3,Hive,配置单元中有一个指向未分区的s3位置的外部表。该表指向s3中的一个文件夹，但数据位于该文件夹中的多个子文件夹中即使未通过在配置单元中设置几个属性对该表进行分区，也可以查询该表，如下所示：，设置hive.input.dir.recursive=true 设置hive.mapred.supports.subdirectories=true set-hive.supports.subdirectories=true 设置mapred.input.dir.recursive=true 但是，当spa

配置单元中有一个指向未分区的s3位置的外部表。该表指向s3中的一个文件夹，但数据位于该文件夹中的多个子文件夹中

即使未通过在配置单元中设置几个属性对该表进行分区，也可以查询该表，如下所示：，

设置hive.input.dir.recursive=true
设置hive.mapred.supports.subdirectories=true
set-hive.supports.subdirectories=true
设置mapred.input.dir.recursive=true
但是，当spark中使用相同的表，使用类似于df=sqlContext.sql（“select*from table_name”）
的sql语句将数据加载到数据帧中时，操作会失败，并说“外部s3位置中的子文件夹不是文件”

我尝试使用sc.hadoopConfiguration.set（“mapred.input.dir.recursive”、“true”）
方法在spark中设置上述配置单元属性，但没有任何帮助。看起来这只对sc.textFile类型的加载有帮助。这可以通过在spark中设置以下属性来实现：，
sqlContext.setConf（“mapreduce.input.fileinputformat.input.dir.recursive”、“true”）

请注意，该属性是使用Sign sqlContext而不是sparkContext设置的。
我在spark 1.6.2中对此进行了测试




[amazon s3]相关文章推荐



                                                        
Amazon s3 如何防止s3 bucket的目录浏览？
amazon-s3 
Amazon s3 非罐装ACL S3桶，带云层
amazon-s3amazon-ec2amazon-web-servicesamazon-cloudformation 
Amazon s3 libs3多部分上传
我使用LIbs3从我的C++项目中存储文件S3。我看到一篇文章说amazon支持S3上的多部分上传。但是我在libs3中找不到这个特性
amazon-s3 
Amazon s3 使用s3cmd sync时如何指定mime类型映射？
amazon-s3 
Amazon s3 命令将文件从Amazon S3下载到本地系统驱动器
amazon-s3 
Amazon s3 s3cmd复制文件保存路径
amazon-s3 
Amazon s3 如何在AmazonS3中使用MediaInfo？
amazon-s3 
Amazon s3 更改存储在Liferay AWS S3文档库存储库中的图像名称
amazon-s3liferay 
Amazon s3 在S3/CF上托管Javascript的推荐CORS配置是什么？
amazon-s3cors 
Amazon s3 AWS lambda无服务器图像大小调整
amazon-s3aws-lambda 
Amazon s3 Apache Camel AWS S3 Bucket嵌套目录：
amazon-s3apache-camel 
Amazon s3 ODO无法访问S3ResponseError:403禁止的S3
amazon-s3 
Amazon s3 在S3上更新我的静态网站的文件
amazon-s3 
Amazon s3 控制从s3到用户的下载是否成功
amazon-s3 
Amazon s3 需要Spring集成示例代码，aws S3作为入站，apache kafka作为出站
amazon-s3apache-kafkaspring-integration 
Amazon s3 长生不老药弧：延长S3标头到期时间
amazon-s3elixir 
Amazon s3 aws cli s3 bucket删除具有日期条件的对象
amazon-s3 
Amazon s3 s3上数据的配置单元分区
amazon-s3hive 
Amazon s3 禁用检查点时Flink StreamingFileSink不接收S3
amazon-s3apache-flink 
Amazon s3 在pyspark数据帧中的分区上迭代
amazon-s3pyspark 
                                       





随机文章推荐



                                                        
无法在ActiveMQ中设置拦截器/brokerplugin
activemq 
Activemq ExchangePattern.InOnly对mq端点的影响
activemqapache-camel 
Activemq 我们可以纯粹使用非Websphere MQSeries软件与远程Websphere MQSeries对话吗
activemqapache-camelibm-mq 
Activemq Mule在transformer中使用JMS连接器
activemqmule 
如何在ActiveMQ出站网桥中设置maxConnections？
activemqibm-mq 
邮件的ActiveMQ重新传递策略
activemq 
ActiveMQ代理重新交付与消费者重新交付
activemq 
使用fusesource在java中获取Activemq连接超时
activemqmqtt 
更改ActiveMQ Web控制台（ServiceMix版本）上的默认密码
activemq 
Activemq 如何在队列中使用相同的消息头id？
activemq 
如何在C中设置ActiveMQ重新交付策略#
activemq 
ActiveMQ-如果在任何队列中发生异常，则发送到多个队列的消息将卡在队列中
activemq


                                        

                                        
                                        


                                                
                                                        [apache spark]相关推荐
                                                        
Apache spark Websphere MQ作为Apache Spark流的数据源
									Apache Spark
							 									Ibm Mq
							 
Apache spark 火花壳自动导入
									Apache Spark
							 
Apache spark 我无法在spark webui上打开作业说明
									Apache Spark
							 
Apache spark 在映射函数中引用RDD中的下一个条目
									Apache Spark
							 
Apache spark spark redshift需要大量时间写入redshift
									Apache Spark
							 									Amazon Redshift
							 
Apache spark 填充Spark数据框中缺少的值
									Apache Spark
							 									Pyspark
							 
Apache spark PySpark SQL:整合.withColumn调用
									Apache Spark
							 									Pyspark
							 
Apache spark 尝试通过ssh连接到Amazon EMR Spark群集时出现“操作超时”错误
									Apache Spark
							 									Ssh
							 
Apache spark 带kafka的Spark流媒体-从检查点重新启动
									Apache Spark
							 
Apache spark 如何在Spark Streaming DirectAPI中并发读取每个Kafka分区
									Apache Spark
							 									Apache Kafka
							 
Apache spark 远程计算机上的群集设置上没有相同的代码
									Apache Spark
							 									Pyspark
							 
Apache spark Structured Streaming 2.1.0 stream to Parquet创建了许多小文件
									Apache Spark
							 
Apache spark ApacheSpark-从spark DataFrame将数据写入拼花地板文件时出现不可解析的数字问题
									Apache Spark
							 
Apache spark 我可以用spark submit发送整个文件夹吗？
									Apache Spark
							 
Apache spark 如何在数组字段上加入？
									Apache Spark
							 
Apache spark 使用kubernetes处理spark 2.3中spark提交的远程依赖关系
									Apache Spark
							 									Amazon S3
							 									Kubernetes
							 
Apache spark Apache Spark 2.2.0 blockmanager内存计算
									Apache Spark
							 
Apache spark 如何防止谓词下推？
									Apache Spark
							 
Apache spark 目录扩展在独立部署模式下不起作用：Apache Spark
									Apache Spark
							 									Apache Kafka
							 
Apache spark 错误：无法找到或加载主类org.apache.spark.launcher.main
									Apache Spark
							 
Apache spark Spark数据帧缓存似乎对后续操作没有任何影响
									Apache Spark
							 
Apache spark 如何在spark中对JSON文件进行流式传输（kafka）并将其转换为RDD？
									Apache Spark
							 									Pyspark
							 									Apache Kafka
							 
Apache spark Pypark解析加载时的文件名
									Apache Spark
							 									Pyspark
							 
Apache spark 在spark submit中，是否有方法将spark附带的hdfs委派令牌用于其他hdfs连接？
									Apache Spark
							 									Pyspark
							 
Apache spark 热得要命”；从pyspark导入SparkContext“；工作
									Apache Spark
							 									Pyspark
							 
Apache spark 避免在没有缓存的spark中对代码进行惰性计算
									Apache Spark
							 
Apache spark 为什么我会得到“一个”呢；不是拼花锉刀；读取拼花地板文件时出错
									Apache Spark
							 
Apache spark Spark如何仅在分区内连接
									Apache Spark
							 
Apache spark SparkSql如果值为null，则取上一个值
									Apache Spark
							 									Pyspark
							 
Apache spark 升序utcstamp无效-数字后面缺少零（Pyspark）
									Apache Spark
							 									Pyspark
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Biztalk
Microservices
Jestjs
Jboss
Fluent Nhibernate
Ibm Mq
Requirejs
Charts
Stream
Silverlight 4.0
Apache Kafka
Ruby
Netsuite
Date
Redis
Protocol Buffers
Pascal
Cryptography
Java 8
Vector
Map
Keyboard
Macos
Stanford Nlp
Ftp
Stm32
Gcc
Lucene
Eclipse Plugin
Actions On Google
Polymer
Cmake
Reference
Socket.io
Blockchain
Yii2
Cassandra
Drupal
String
Ios4
Stripe Payments
Scikit Learn
Junit
Service
Opengl
Vbscript
Xcode
Certificate
Jquery Mobile
Windbg
Hash
If Statement
Apache Flex
Hyperlink
Silverstripe
Facebook Graph Api
Jasper Reports
Project Management
Firefox Addon
Spring Boot
Datatables
Sharepoint 2013
Floating Point
Automation
Sapui5
Couchdb
Ldap
Joomla
Sharepoint 2010
Webview
Encoding
Codeigniter
Ckeditor
Loops
Configuration
Activemq
Redirect
Twilio
For Loop
Collections
Google Plus
Discord.py
Sonarqube
Julia
Menu
Shell
Algorithm
Python Sphinx
Automated Tests
Big O
Knockout.js
Transactions
Iis
Regex
Entity Framework 4
Random
Python 2.7
Jpa
Browser
Path
Ipad
Clearcase
Breeze
Design Patterns
Highcharts
Windows Phone 8.1
Leaflet
Common Lisp
Zurb Foundation
Push Notification
Scheme
Actionscript
Tinymce
Swing
Calendar
Puppet
Flash
Kubernetes
Sip
Azure Service Fabric
Arduino
Multithreading
Grafana
Django
Ravendb
Racket
Wso2
Layout
Arm
Ant
Wicket
Nunit
Domain Driven Design
Asterisk
Delphi
Compression
Oop
Zend Framework2
Coding Style
Asp.net Web Api
Tsql
Jquery Plugins
Groovy
Editor
Azure
Interface
Uml
Com
Select
Javafx
Web Applications
Syntax
Identityserver4
Directx
Asp.net Mvc
Apache Storm
Appium
Angular Material
Bazel
Oracle10g
Ms Word
Applescript
Jenkins
Php
Material Ui
Iframe
Networking
Nosql
Terminal
Npm
Api
Llvm
Rust
Jwt
Sencha Touch 2
Apache Flink
Vim
Documentation
Airflow
Cocoa
Markdown
Instagram
Dynamic
Rx Java
Omnet++
Asp Classic
Outlook
Asp.net Core
Ansible
Mule
Twitter Bootstrap 3
Selenium Webdriver
Qml
Amazon Web Services
Meteor
Smtp
Microsoft Graph Api
Tree
Itext
Safari
Visual Studio 2012


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网