如何使用MongoDB的Spark connector匹配空值？_Mongodb_Apache Spark_Pyspark_Aggregation Framework - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/mongodb/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用MongoDB的Spark connector匹配空值？_Mongodb_Apache Spark_Pyspark_Aggregation Framework - Fatal编程技术网

如何使用MongoDB的Spark connector匹配空值？

mongodb apache-spark pyspark

如何使用MongoDB的Spark connector匹配空值？,mongodb,apache-spark,pyspark,aggregation-framework,Mongodb,Apache Spark,Pyspark,Aggregation Framework,我正在尝试使用带有pyspark MongoDB连接器的聚合函数查询MongoDB集合，但无法执行与null的匹配我已经在管道中尝试过： {'$match' : {'deleted_at': null}} {'$match' : {'deleted_at': 'null'}} {'$match' : {'deleted_at': None}} {'$match' : {'deleted_at': False}} {'$match' : {'deleted_at': 0}} 但似乎什么都不管用

我正在尝试使用带有pyspark MongoDB连接器的聚合函数查询MongoDB集合，但无法执行与null的匹配

我已经在管道中尝试过：

{'$match' : {'deleted_at': null}}
{'$match' : {'deleted_at': 'null'}}
{'$match' : {'deleted_at': None}}
{'$match' : {'deleted_at': False}}
{'$match' : {'deleted_at': 0}}

但似乎什么都不管用。有什么想法吗？

我找到了一个可能的解决方案，可以避免更改所有查询。解决方案是匹配以下类型：

{'$match' : 'deleted_at': { '$type': 10 }}

因为10对应于null类型

，所以您可以利用Spark中的下推过滤器（默认情况下）将

filter

s与DataFrames或Python API一起使用时，底层Mongo连接器代码构造一个聚合管道，在将数据发送到Spark之前过滤MongoDB中的数据

Python代码
你用Spark SQL尝试过df.filter（$“deleted_at”==null）吗？也许你可以在mongo db中尝试一下。因此，您可以确认Spark正在使用过滤器构建mongo db聚合管道。
from pyspark.sql import SparkSession my_spark = SparkSession \ .builder \ .appName("myApp") \ .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \ .getOrCreate() df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load() filtrDf = df.filter(df['deleted_at'] == 'null') filtrDf.explain() // check for physical plan of this output

[apache spark]相关文章推荐

随机文章推荐

如何在Selenium IDE中使用selectAndWait？ selenium

Selenium 请禁用代理并使用直接连接 selenium cucumber

Selenium 任何等待某些javascript代码的waitForJs函数都会返回true selenium go selenium-webdriver

Selenium Jenkins如何管理工作区文件 selenium jenkins

通过Selenium Webdriver脚本在注销时获取500错误页面 selenium selenium-webdriver

Selenium 联合概念；带参数的http POST to/session引发Curl错误：“0”； selenium phantomjs

Selenium xcodebuild失败，代码为65 selenium appium

如何使用SeleniumWebDriver单击框架中的按钮 selenium selenium-webdriver

python Selenium NoSuchElementException:消息：没有这样的元素：无法定位元素 selenium

我知道assert和verify之间的区别，但想知道seleniumwebdriver中verify的语法 selenium

Selenium 站点身份验证webdriver codeception selenium

Selenium 在eclipse ide中执行脚本时出现TestNG错误 selenium testng

Selenium 测试运行程序中未识别步骤定义 selenium cucumber

Selenium Webdriver-检查日志文件中的异常 selenium exception selenium-webdriver

OWASP ZAP-使用Selenium Python启动浏览器后提取URL selenium

关键字驱动的框架和Selenium网格以及单个测试用例的报告 selenium selenium-webdriver

Selenium 更新到Katalon 6.0.5后，“打开浏览器”不起作用 selenium selenium-webdriver automated-tests

Selenium 在TestNG并行中运行时测试用例失败 selenium selenium-webdriver testng

如何在selenium中使用代理来避免在抓取数据时的IP限制？ selenium web-scraping proxy

Selenium Nightwatch-Node.js-如何删除HTML元素？ selenium automation

[mongodb]相关推荐

Tags

Ruby On Rails 4 Windbg Nservicebus Orientdb Https Oauth 2.0 Woocommerce Aframe Amazon Web Services Azure Cosmosdb Puppet Npm Instagram Monitoring Regex Computer Science Tree Blazor Web Streaming Video Elixir Racket Primefaces Sip Nlp Button Express Identityserver4 Qml Visual C++ Spotify Logic Firefox Notepad++ Spring Boot Webpack Perl Asp.net Mvc 5 Oauth Bots .htaccess Css Discord.js Botframework Jekyll Pointers Silverstripe Content Management System Spring Security Opencv Variables Python Certificate Security Prometheus F# Visual Studio 2017 Network Programming Django Models Gradle Function Openid Sails.js Redis Clojure Binding Google Cloud Dataflow Devexpress Types Pip Server Json Scheme Wpf Google Maps Tsql Processing Cluster Computing View Windows 7 Encryption Ibm Mobilefirst Domain Driven Design Symfony1 Microsoft Graph Api Scrapy Caching Postman Binary Subsonic Laravel 5 Rxjs Math Scikit Learn Xamarin.ios Dependencies Internationalization Mpi Python 2.7 Ios4 Nosql Transactions Inheritance Facebook Graph Api React Native Bazel Jar Geolocation Tcp Stata Jpa Spring Cloud Oracle11g Entity Framework 4 Xamarin Com Hazelcast Statistics Unicode Quickbooks Reporting Services Ecmascript 6 Cocoa Sparql Matlab Office Js Gnuplot Design Patterns Algorithm Xml Gtk Wso2 Ethereum Syntax Google Drive Api Doxygen Open Source Acumatica Ibm Midrange Ruby On Rails 3.1 Mqtt Vhdl Wix Sqlite Xcode4 Indexing Compiler Construction Kubernetes Google Compute Engine Sugarcrm Vb6 Phpunit Postgresql Report Twilio Rss Map Generics Excel Entity Framework Core Sql Server 2008 R2 Lisp Sphinx Responsive Design Xsd Jvm Dynamics Crm 2011 Notifications Virtualbox Google Maps Api 3 Mapbox Scala Mapreduce Vmware Graphql Programming Languages Twitter Soap Google Colaboratory Delphi Yii Sonarqube Ssrs 2008 Coding Style Centos Parse Platform Nuget Rally Jasmine Post Kendo Ui Animation Service Mule Sharepoint 2013 Pascal Azure Sql Database Amazon Dynamodb X86 Windows Mobile

Copyright © 2024. All Rights Reserved by - Fatal编程技术网