Python 熊猫过滤/组合相似的字符串值_Python_Pandas - Fatal编程技术网

Python 熊猫过滤/组合相似的字符串值

python pandas

Python 熊猫过滤/组合相似的字符串值,python,pandas,Python,Pandas,我有一个包含名称的数据框，我正在尝试组合相似的名称。例如： | name | foo_val | | --------- | ------- | | Andrew | 2 | | Braden | 1 | | Cheryl | 4 | | Cheryl :D | 1 | | Christian | 1 | | Derrick | 2 | | Derrick L | 2 | 等等

我有一个包含名称的数据框，我正在尝试组合相似的名称。例如：

| name      | foo_val |
| --------- | ------- |
| Andrew    | 2       |
| Braden    | 1       |
| Cheryl    | 4       |
| Cheryl :D | 1       |
| Christian | 1       |
| Derrick   | 2       |
| Derrick L | 2       |

等等。。。如果内容足够相似（如上例中的Cheryl和Derrick），我希望合并行（和foo_val的值），因此如下所示：

| name      | foo_val |
| --------- | ------- |
| Andrew    | 2       |
| Braden    | 1       |
| Cheryl    | 5       |
| Christian | 1       |
| Derrick   | 4       |

我还不太了解熊猫，但我已经看过了

复制的（如df.duplicated（'name'）
）和groupby
以及merge
，但我很确定这些不是我想要的（很可能是错误的…）。在这一点上，我搜索了很多，但假设这是以前问过的，所以如果我错过了，请指出其他问题/答案
我可以想象一种在纯Python中通过迭代实现这一点的方法，但我很想知道在Pandas中是否可以实现这一点…
在您发布的示例中，您需要按字符串的第一部分分组并合并结果。这可以通过使用
df.groupby(df.name.str.split().str[0]).foo_val.sum().reset_index()


    name        foo_val
0   Andrew      2
1   Braden      1
2   Cheryl      5
3   Christian   1
4   Derrick     4

这个问题的真正诀窍是定义什么是相似的名字。是的，我明白了！那太好了！我暗自怀疑groupby可能会玩这个把戏，我想我还需要深入研究一下！非常感谢。只是一张纸条。我怀疑可能有很多名字里面都有空格，于是通过nltk的名字语料库进行了搜索。。。事实证明并没有太多的数据。不过，我完全同意解决这个问题完全取决于数据的质量。这就是我开始的原因，在你上面发布的例子中。。。。




[pandas]相关文章推荐



                                                        
Pandas 过滤数据帧中的倍数
pandasdataframefilter 
Pandas 按项目列出的用户
pandas 
使用提供了chunksize选项的pandas read_csv时内存不足
pandas 
Pandas Pyinstaller可执行文件非常大
pandasnumpyanaconda 
Pandas 从另一个数据帧连接多个列
pandas 
Pandas 使用；或；在dataframe中搜索字符串
pandasdataframe 
Pandas 使用matplotlib三维打印简单数据集
pandasmatplotlib 
Pandas 熊猫-以一根柱子和另一根柱子的圆木为基础
pandasnumpy 
Pandas 如何选择满足条件的列名称
pandas 
Pandas 如何使用groupby从某一列中获取第n个最大值，并对同一行的另一列进行计算
pandas 
Pandas 熊猫选择时间行，而不是日期时间行
pandas 
Pandas 通过特定行之间的掩蔽来子集df
pandasdataframe 
Pandas 带列布尔数组的布尔索引
pandas 
Pandas 时间序列分析-将值放入存储箱
pandas 
Pandas 我的数据帧有一个系列，其中包含应该是浮动的对象，它们有“；M”“；K”；附加到它们和小数。我怎样把它们分开？
pandas 
Pandas 属性的行为与预期的不一样
pandasdataframe 
Pandas 使用一个数据帧列中的键和另一个数据帧列中的值创建字典
pandasdataframedictionary 
Pandas 通过单击获取数组中行的值。伊维德朱皮特酒店
pandasdataframe 
Pandas 左对齐标记中的整个表格（Jupyter）
pandasmarkdown 
Pandas 按索引编号排序的多索引数据帧索引
pandas 
                                       





随机文章推荐



                                                        
Apache camel Apache驼峰错误：方法进程不适用于参数
apache-camelroutes 
Apache camel apachecamel/ActiveMQ优先级路由
apache-camelactivemq 
Apache camel camel-bindy十进制模式
apache-camel 
Apache camel 骆驼蓝图测试与黄瓜
apache-camel 
Apache camel 在Apache Camel中，有没有一种方法可以通过另一条路由上的一条路由获得并释放锁
apache-camel 
Apache camel Camel onException重新交付澄清
apache-camel 
Apache camel 在Jboss Fuse 6.2.1（结构管理）中安装hazelcast 3.6.5的问题
apache-camelhazelcast 
Apache camel 如何从部署的包访问karaf内部的静态文件系统
apache-camel 
Apache camel 带有Camel基础的ActiveMQ
apache-camelactivemq 
Apache camel 如何将JMS重新交付到我的驼峰路由中，但仍然允许删除消息
apache-cameljmsactivemq 
Apache camel CamelBlueprint测试，为什么可以'；我是否可以覆盖此propertyplaceholder？
apache-camel 
Apache camel Camel JPA事务+实体管理器
apache-camel 
Apache camel 如何为使用swagger for camel REST服务生成的api指定“securityDefinitions”、“security”？
apache-camelswagger 
Apache camel 路线不是从Karaf 4.1.1开始，而是在ServiceMix中运行
apache-camel 
Apache camel 未执行的路由：camel/fuse/activemq
apache-camel 
Apache camel 动态Apache驼峰输出路由
apache-camel 
Apache camel Apache Camel multiple pollenrich（）和移动参数错误
apache-camel 
Apache camel 如何同步执行不同交换机的驼峰路由？
apache-camel 
Apache camel 如何在配置时在驼峰路由或交换上设置用户定义的元数据
apache-camel 
Apache camel Camel netty4 http客户端对http 100的处理继续
apache-camelnetty


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
Python3中的多处理池
									Python
							 									Python 3.x
							 
在上设置Python开发服务器时出现问题http://127.0.0.1:8000/
									Python
							 									Django
							 
Python 从.txt文件中删除行时出错
									Python
							 
Python:Tkinter中的更改应用程序图标挂起
									Python
							 									Multithreading
							 									Python 2.7
							 									Tkinter
							 
Python 跨动态列数应用
									Python
							 									Pandas
							 
Python 图像SSD计算错误值
									Python
							 									Numpy
							 
Python Web2py-运行循环&；断开循环而不锁定应用程序
									Python
							 									Selenium
							 
Python 熊猫加入具有不同索引级别/日期时间的数据帧？
									Python
							 									Pandas
							 
使用Python进行Web抓取：超出RuntimeError最大递归深度
									Python
							 									Web Scraping
							 
在Python中，如何从文件创建2D列表并为每个子列表中的每个项分配类型？
									Python
							 									List
							 									Types
							 
Python 数据帧的多索引
									Python
							 									Pandas
							 									Dataframe
							 
如何使用python和selenium将多个图像上载到Instagram
									Python
							 									Selenium
							 									Selenium Webdriver
							 									Instagram
							 
VisualStudio在python中显示导入类中的对象和方法的文档
									Python
							 									Visual Studio Code
							 
Python “错误”；TypeError:/：'；列表'；和'；int'&引用；
									Python
							 
Shodan-Python语法使用；信息“；标签
									Python
							 
Python /edit:edit（）处的TypeError缺少1个必需的位置参数：'；条目'；
									Python
							 									Html
							 									Django
							 
Python 如何让80位浮点在numpy中工作
									Python
							 									Numpy
							 									Floating Point
							 
Python 如何对2个相同数组数据的输出进行排序
									Python
							 									Arrays
							 
Python 异常值：_str__在django模型中返回了非字符串（int类型）
									Python
							 									Django
							 									Django Models
							 
如何在Python中使用7z解压
									Python
							 									Python 3.x
							 
Python Django URL翻译依赖于URL前缀语言代码
									Python
							 									Django
							 
Python 隔离林的LIME ML解释器模式分类或回归（异常检测）
									Python
							 
Python &引用；非法指令（内核转储）“；关于tensorflow>；1.6
									Python
							 									Docker
							 									Tensorflow
							 
Python 有没有办法在这个代码中定义一个对象'；s cart方法，而不覆盖上一个对象？
									Python
							 
Python 使我的Django视图成为DjangoRestFramework API端点
									Python
							 									Django
							 									Django Rest Framework
							 
Python 安装tensorflow后的消息
									Python
							 									Python 3.x
							 									Tensorflow
							 
Python 如何在Visual Studio代码中访问settings.json的pylint部分？
									Python
							 									Django
							 									Visual Studio Code
							 
Python SpyderKernelAppnWARNING |无此类通信：2ED06AE37E411EB867974E5F9FF3C18（OPENCV）
									Python
							 									Opencv
							 									Computer Vision
							 									Anaconda
							 
Python Pandas-自动替换数据帧上的字符串
									Python
							 									Pandas
							 
使用RStudio之类的工具开发Python
									Python
							 									Pycharm
							 									Ide
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Filter
Blockchain
Jakarta Ee
Xmpp
Swiftui
Msbuild
Web Services
Mongodb
Graphviz
Z3
Webgl
Haskell
For Loop
Automation
Salesforce
Zend Framework2
Dask
Coding Style
Http
Asp.net Mvc 2
Documentation
Sip
Libgdx
Wxpython
Antlr
Ruby On Rails 4
Embedded
Facebook Graph Api
Talend
Discord
Dynamic
Apache Storm
Ssl
Linq
Hibernate
Erlang
Download
Types
Time
Algorithm
Polymer
Csv
Gradle
Windows Runtime
Dynamics Crm
Mercurial
EmptyTag
Next.js
D
Jsp
Corda
Model View Controller
Gatsby
Iis
Google App Engine
Architecture
Wordpress
Autodesk Forge
Socket.io
Xamarin.android
Browser
Cassandra
Glassfish
React Native
Dojo
Internet Explorer
Reference
Nosql
Youtube
Install4j
Graphql
Network Programming
Cloud
Cmake
Virtualbox
Botframework
Com
Spring
Aframe
Timer
Iframe
Teamcity
Amazon Cloudformation
User Interface
Entity Framework
Sorting
Dom
Phpstorm
Vaadin
Azure Cosmosdb
Cryptography
Robotframework
Wso2
Grafana
Macos
Lotus Notes
Osgi
Go
Gridview
Logstash
Linux Kernel
Actionscript 3
Gmail
Docker Compose
Lucene
Laravel 5
Android Studio
Vim
Ms Word
Uitableview
Mod Rewrite
Drupal 7
C++ Cli
Configuration
Mobile
Maps
Ada
Cmd
Node.js
Cron
Python Sphinx
Cygwin
Kubernetes
Wicket
Plot
Chart.js
Mariadb
Google Maps
Selenium Webdriver
Ms Office
Soap
Xamarin.ios
Actions On Google
Jquery
Camera
Isabelle
Xcode
C++11
Angular
Nestjs
Rspec
Asterisk
Content Management System
Google Cloud Dataflow
Multithreading
Routing
Ip
Stream
Codenameone
Visual Studio
Autocomplete
Google Chrome Devtools
Oracle Apex
Abap
Windbg
Fiware
Statistics
Plsql
Stanford Nlp
Kentico
Internet Explorer 8
Delphi
Java
Frameworks
Jira
Chef Infra
Amp Html
Post
Xamarin.forms
Nhibernate
Arangodb
Boost
Neo4j
Grails
Azure Ad B2c
Function
Recursion
Visual Studio 2015
Rest
Android Layout
Powerbi
Ipad
Date
Asynchronous
Firefox Addon
Centos
Colors
Sqlite
Interface
Sharepoint 2013
Netty
Asp Classic
Django
Laravel
Streaming
Twitter Bootstrap 3
Llvm
Kendo Ui
Cloud Foundry
Opencart
Mule


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网