String 如何使用pySpark识别列中是否存在特定的字符串/模式_String_Apache Spark_Pyspark_Apache Spark Sql_Sql Like - Fatal编程技术网

String 如何使用pySpark识别列中是否存在特定的字符串/模式

string apache-spark pyspark

String 如何使用pySpark识别列中是否存在特定的字符串/模式,string,apache-spark,pyspark,apache-spark-sql,sql-like,String,Apache Spark,Pyspark,Apache Spark Sql,Sql Like,下面是我的家庭用品示例数据框架此处W代表木制 G代表玻璃，P代表塑料，不同的项目被归类在该类别中。所以我想确定哪些项目属于W，G，P类别。作为第一步，我尝试将其分类为椅子 M = sqlContext.createDataFrame([('W-Chair-Shelf;G-Vase;P-Cup',''), ('W-Chair',''), ('W-Shelf;G-Cup

下面是我的家庭用品示例数据框架

此处W代表木制 G代表玻璃，P代表塑料，不同的项目被归类在该类别中。所以我想确定哪些项目属于W，G，P类别。作为第一步，我尝试将其分类为椅子

M = sqlContext.createDataFrame([('W-Chair-Shelf;G-Vase;P-Cup',''), ('W-Chair',''), ('W-Shelf;G-Cup;P-Chair',''), ('G-Cup;P-ShowerCap;W-Board','')], ['Household_chores_arrangements','Chair']) M.createOrReplaceTempView('M') +-----------------------------+-----+ |Household_chores_arrangements|Chair| +-----------------------------+-----+ | W-Chair-Shelf;G-Vase;P-Cup| | | W-Chair| | | W-Shelf;G-Cup;P-Chair| | | G-Cup;P-ShowerCap;W-Board| | +-----------------------------+-----+
我试着做一个条件，我可以标记为W，但我没有得到预期的结果，可能是我的条件是错误的

df = sqlContext.sql("select * from M where Household_chores_arrangements like '%W%Chair%'") display(df)
在pySpark中有更好的方法吗
预期产量

+-----------------------------+-----+ |Household_chores_arrangements|Chair| +-----------------------------+-----+ | W-Chair-Shelf;G-Vase;P-Cup| W| | W-Chair| W| | W-Shelf;G-Cup;P-Chair| P| | G-Cup;P-ShowerCap;W-Board| NULL| +-----------------------------+-----+
感谢@mck-提供的解决方案
更新除此之外，我还试图分析更多关于regexp_提取选项的内容

M = sqlContext.createDataFrame([('Wooden|Chair',''), ('Wooden|Cup;Glass|Chair',''), ('Wooden|Cup;Glass|Showercap;Plastic|Chair','') ], ['Household_chores_arrangements','Chair']) M.createOrReplaceTempView('M') df = spark.sql(""" select Household_chores_arrangements, nullif(regexp_extract(Household_chores_arrangements, '(Wooden|Glass|Plastic)(|Chair)', 1), '') as Chair from M """) display(df)
结果:

+-----------------------------+-----------------+ |Household_chores_arrangements| Chair| +-----------------------------+-----------------+ | Wooden|Chair |Wooden| | Wooden|Cup;Glass|Chair |Wooden| |Wooden|Cup;Glass|Showercap;Plastic|Chair|Wooden| +-----------------------------+----------------+
将分隔符改为|而不是-并在查询中进行了更改。预期的结果如下，但得出的结果错误

+-----------------------------+-----------------+ |Household_chores_arrangements| Chair| +-----------------------------+-----------------+ | Wooden|Chair |Wooden| | Wooden|Cup;Glass|Chair |Glass | |Wooden|Cup;Glass|Showercap;Plastic|Chair|Plastic| +-----------------------------+----------------+
如果仅更改了分隔符，是否需要更改任何其他值
更新-2
我已获得上述更新的解决方案

对于管道分隔符，我们必须使用4\
对其进行转义。您可以使用
regexp\u extract
提取类别，如果未找到匹配项，则使用
nullif
将空字符串替换为null

df = spark.sql(""" select Household_chores_arrangements, nullif(regexp_extract(Household_chores_arrangements, '([A-Z])-Chair', 1), '') as Chair from M """) df.show(truncate=False) +-----------------------------+-----+ |Household_chores_arrangements|Chair| +-----------------------------+-----+ |W-Chair-Shelf;G-Vase;P-Cup |W | |W-Chair |W | |W-Shelf;G-Cup;P-Chair |P | |G-Cup;P-ShowerCap;W-Board |null | +-----------------------------+-----+

好啊但是，如果在家庭琐事安排中它是木制的而不是W，我们需要将它们更改为近似索引吗？那么您需要将正则表达式模式更改为，例如，
”（木制的玻璃塑料）-椅子“
，因此在regexp_摘录-1中，表示除需要拾取的组值/数字外，不需要注意。和-chair是我们要匹配的-表示示例数据中的分隔符。希望我是对的..这个例子也很好，在研究多个regexp_提取时，我尝试在示例数据和查询中用-替换为|，但是它没有给我预期的结果。让我看看我是否可以在同一个问题中发布更新。我得到了“分隔符”的解决方案，我们必须使用4个转义符（木制、玻璃、塑料）（\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\。

[apache spark]相关文章推荐

Apache spark 对spark数据帧行的所有字段应用相同的函数 apache-spark

Apache spark 用map-reduce实现分组 apache-spark mapreduce

Apache spark 向spark dataframe添加一列，其值为现有dataframe行的hashMod apache-spark

Apache spark Spark 2.0中实现列级操作的有效方法 apache-spark

Apache spark Spark可以从phoenix读取数据，但写入时没有找到适合jdbc的驱动程序：phoenix错误 apache-spark jdbc hbase

Apache spark 有没有办法更快地启动阿帕奇齐柏林飞艇？ apache-spark

Apache spark Spark作业在群集大小较大时失败，在较小时成功 apache-spark

Apache spark EMR:无法并行运行步骤 apache-spark

Apache spark pyspark将负值替换为零 apache-spark pyspark

Apache spark Kafka消费群体和Spark结构化流媒体分区 apache-spark apache-kafka

Apache spark 如何将RDD解析为Dataframe apache-spark hbase

Apache spark spark-streaming-kafka-0-8_2.12中未解决的依赖关系；2.4.4 apache-spark pyspark

Apache spark 气流模块NotFoundError:没有名为'；Pypark'； apache-spark pyspark airflow

Apache spark 在PySpark中将字符串常量添加为列时，未解析引用点亮 apache-spark pyspark

Apache spark 结构化流与批处理性能差异 apache-spark

Apache spark ApacheSpark结构化流式处理需要很长时间才能打印字数统计示例的输出 apache-spark pyspark

Apache spark Databricks dbutils引发NullPointerException apache-spark

Apache spark serializers.py中导致PySpark中的ModuleNotFoundError apache-spark pyspark

Apache spark 根据spark Scala中的以下逻辑在spark中生成ID apache-spark pyspark

Apache spark Spark Databricks超慢速读取拼花地板文件 apache-spark amazon-s3

随机文章推荐

Openstack 如何在devstack中管理用户/密码？ openstack

DevStack/OpenStack：如何创建租户？ openstack

在两个物理节点上使用中子的Openstack openstack

Openstack。为来宾分配更多ram openstack

Openstack nova scheduler不将rpc.cast转换为nova compute，没有错误，但vm处于“调度”状态 openstack

OpenStack4J：如何列出identity（keystone）v3的角色？ openstack

Openstack Terraform，如何在现有资源上运行provisioner？ openstack terraform

安装keystone for openstack（stein）时出现问题 openstack

[string]相关推荐

Lua中string.find和string.match的区别是什么？
String Lua

String awk&xE5äö；umlaut字符的长度为2
String Macos Awk

String 字符串与动态规划算法
String Algorithm

String 用空值替换空字符串
String Postgresql Replace

String 博尔兰C++；生成器6和字符串连接我使用Borland C++ +Builder 6尝试做一些简单的字符串连接。然而，我遇到了一个我认为有趣的问题
String

String 提取精确匹配字符串的索引
String R

String 从文件读入的字符串不响应字符串操作
String Perl File Io

String 将字符串从BSS变量复制到程序集中的BSS变量
String Assembly

String 组合或构造韩语字母
String Unicode Go

String 在批处理文件中最后一个分隔符实例之后提取字符串
String Batch File

String Perl中的字符串比较
String Perl

String 对长度为n的n个字符串进行排序的最快方法是什么？
String Algorithm Sorting

String 在vba excel中将日期字符串转换为日期类型
String Excel Vba Date

String Powershell-从字符串变量中提取第一个和第三个元素
String Powershell

String 使用多行文本框时，在光标位置插入字符串会错误地放置该字符串
String

String 如何检查宏的元素是否在另一个宏中
String Stata

String VBA比较字符串（检查字符串1是否包含字符串2）
String Excel Vba

String 使用For从单元格中移动
String Excel Vba Variables For Loop

String 有3个字符且没有相邻重复序列的最长字符串？
String Algorithm

String 奇怪的错误c++；在抛出'；标准：：逻辑错误'；what（）：基本\u字符串：：\u M\u构造null无效
String C++11

String 执行Bash子字符串替换时，vim会不断显示语法错误-语法错误还是错误标志？
String Bash

String SAS不识别字符串变量值
String Sas

String 利用awk提取色谱柱范围和重构矩阵
String Awk

String Chr（34）返回两个双引号字符而不是一个
String Vb.net Winforms

String 如何更改dqpk查询-L<；包装名称>；看法
String Awk Sed Directory

String 无法在选择上使用强制转换为记录
String Google Bigquery

String 当链接以某些文本结束时重定向[不是扩展名！]-删除最后一个文本
String .htaccess Redirect

String 如何在特定字符之后和之前删除字符串
String Typescript Replace

String 特定字体中字符串的长度
String Perl Fonts

String 是否可以使用spark数据框（pyspark）中的d-type查找哪列是日期？
String Date Pyspark

Tags

Flutter Keras Templates Network Programming Weblogic Ubuntu Linux Angular Material Validation Instagram Jaxb Deep Learning Ios4 Jwt Emacs Stripe Payments Dictionary Core Data Asp.net Mvc 3 Oauth Parse Platform Robotframework Windows Installer Jakarta Ee Apache Nifi Jsf 2 Google Api Spring Batch Sql Server 2008 R2 Quickbooks Firefox Addon Opencl Transactions Iframe Sap Keyboard Adobe Date Leaflet Streaming Java Me Compiler Construction Version Control Common Lisp Mule Asp.net Core Datetime Windows Phone 8 Content Management System Command Line Compiler Errors Stored Procedures Jupyter Notebook Doctrine Prometheus Devexpress Wpf Design Patterns Excel Stanford Nlp .net Openlayers 3 Autocomplete Makefile Tkinter Redux Redirect Graphviz Notifications User Interface Powerbi Antlr4 C++ Ios5 Node.js Clang Fluent Nhibernate Data Binding Visual Studio 2017 Ios8 Sqlite Fullcalendar Tcl Azure Devops Embedded Sass .net 4.0 Yii2 List Animation Artificial Intelligence Reactjs Ibm Cloud Google Cloud Storage Svn Unity3d Flash Http Webgl Laravel Django Rest Framework Unit Testing Dns Ipad Iphone Oracle Apex Arangodb Firebase String Object Drupal Single Sign On Macros Air Exception Input Protractor Time Bazel Nest Grafana Machine Learning Language Agnostic Xmpp Sapui5 Apache Spark Windows Phone 8.1 Struts2 Apache Kafka Apache2 Image Processing Utf 8 Java 8 Logstash Openerp Primefaces C# Filesystems Nhibernate Crystal Reports Inno Setup Git Maps Swiftui Jqgrid Socket.io Ruby Azure Cosmosdb Ios6 Push Notification Button Redis Logic Ipython Asp Classic Multithreading Mercurial Active Directory Visual Studio 2012 Yocto Mono Aurelia Pentaho Npm Orchardcms Android Javafx Visual Studio 2013 Automated Tests Rspec Ios Twitter Bootstrap Types Ionic2 Notepad++ Kubernetes Sql Server 2012 Razor Encoding Ionic Framework Jms Replace Random Certificate Nativescript Cocoa Touch Jquery Scrapy Binary Json Math Pointers Reference Facebook Graph Api Sorting Virtual Machine Coq Neural Network Cygwin Microsoft Graph Api Visual Studio 2008

Copyright © 2024. All Rights Reserved by - Fatal编程技术网