Python 删除重复项，但保留对删除行的引用_Python_Pandas - Fatal编程技术网

Python 删除重复项，但保留对删除行的引用

python pandas

Python 删除重复项，但保留对删除行的引用,python,pandas,Python,Pandas,我有一个包含许多重复行的数据帧。数据集有数百行和数百列每行都有一个唯一的标识符。我想创建一个只有唯一行的数据框。然后我想创建一个映射，将唯一行数据帧中的标识符映射到原始数据帧的标识符比如说 import pandas as pd # Dummy data df = pd.DataFrame({'col_1': [1, 2, 2, 1, 2, 3], 'col_2': [2, 4, 4, 2, 4, 2], 'col

我有一个包含许多重复行的数据帧。数据集有数百行和数百列

每行都有一个唯一的标识符。我想创建一个只有唯一行的数据框。然后我想创建一个映射，将唯一行数据帧中的标识符映射到原始数据帧的标识符

比如说

import pandas as pd

# Dummy data
df = pd.DataFrame({'col_1': [1, 2, 2, 1, 2, 3],
                   'col_2': [2, 4, 4, 2, 4, 2],
                   'col_3': [3, 2, 2, 3, 2, 7]},
                  index=['A', 'B', 'C', 'D', 'E', 'F'])

df
Out[11]: 
   col_1  col_2  col_3
A      1      2      3
B      2      4      2
C      2      4      2
D      1      2      3
E      2      4      2
F      3      2      7

# Unique row dataframe
df_unique = df.drop_duplicates()
df_unique()
Out[12]: 
   col_1  col_2  col_3
A      1      2      3
B      2      4      2
F      3      2      7

# Mapping from df_unique to df
# Creating this mapping is the problem
mapping = {'A': ('A', 'D'),
           'B': ('B', 'C', 'E'),
           'F': ('F')}

在这种情况下，行“A”和“D”相等，“A”在

删除重复项（）之前映射到“A”和“D”
如何创建映射

这里我使用了drop\u duplicates（）
来创建唯一的行数据帧。这不是一项要求。如果有人有更好的想法，映射不必是字典。
首先通过DataFrame的所有列与和元组一起使用，然后创建元组字典：
mapping = (df.reset_index()
             .groupby(df.columns.tolist())['index']
             .agg(['first',tuple])
             .set_index('first')['tuple']
             .to_dict())
print (mapping)
{'A': ('A', 'D'), 'B': ('B', 'C', 'E'), 'F': ('F',)}




[pandas]相关文章推荐



                                                        
Pandas 在同一系列节目中连续播放几次？
pandas 
Pandas 在Python 2.7和Python中使用非唯一索引重塑数据帧
pandasdataframe 
Pandas 大熊猫群体变异表现
pandasperformancenumpy 
Pandas 对于列中的值具有空字符串的数据帧是否可能转换为null？
pandas 
用pandas生成矩阵
pandasmatrix 
在Pandas.read\u sql中使用Flask\u sqlAlchemy跨数据库引擎
pandas 
比较特定列的值并向另一个| Pandas | Python添加一个值
pandasdataframe 
基于按年份分组的Pandas数据框绘制条形图
pandas 
Pandas 如何获取列的特定索引之间的最大值？
pandasdataframe 
Pandas 使用种类参数按PeriodIndex重新采样（）
pandas 
Pandas 如何获取数据帧中特定列的模式值索引
pandasdataframeindexing 
Pandas 用于比较数据库（mysql）表和CSV中的数据
pandascsvmemory 
Pandas 用插值函数填充数据帧中的NaN
pandas 
Pandas 比较一个热编码列标题和预测标签
pandas 
Pandas tf.keras.model.fit需要很长时间
pandascsvtensorflowmachine-learning 
toPandas csv后pyspark数据框中的问题
pandascsvpyspark 
Pandas 计算从日期时间列到特定日期的天数-天
pandasdataframedatetime 
Pandas 使用熊猫浓缩数据文件
pandas 
Pandas 寻找循环以外的替代方案
pandasperformancedataframeloops 
Pandas 从一个单元格中提取字符串，并在数据帧的另一个单元格中找到它
pandas 
                                       





随机文章推荐



                                                        
Selenium webdriver sendKeys（）在自动完成关闭时触发输入字段的NoTouchElementException
selenium-webdriver 
Selenium webdriver Selenium独立服务器2.45；firefox版本36
selenium-webdriver 
Selenium webdriver Robot框架-运行关键字if后执行多个关键字
selenium-webdriverrobotframework 
Selenium webdriver 如何以可执行的形式导入整个webDriver项目，以便其他任何人只需单击一下即可运行该套件。？
selenium-webdriver 
Selenium webdriver selenium webdriver-页面重定向时找不到元素
selenium-webdriver 
Selenium webdriver WebElement是一个接口——如何在它上成功地调用方法？
selenium-webdriver 
Selenium webdriver 在selenium中加载元素时如何读取页面上的元素？
selenium-webdriverautomated-tests 
Selenium webdriver 尝试在java中使用SeleniumWebDriver自动化gmail注册页面
selenium-webdriver 
Selenium webdriver TestNG框架，@Test注释代码不起作用
selenium-webdrivertestng 
Selenium webdriver Java Firefox webdriver无法使用以前的执行cookie导航。基本上，配置文件中已经存储了cookie
selenium-webdriver 
Selenium webdriver 指定了最新的Chrome v74和selenium chromedriver主机头或原始主机头时出错，并且不是localhost
selenium-webdriverprotractor 
Selenium webdriver 在Selenium中拍摄屏幕截图时System.getProperty（）的用途是什么？
selenium-webdriver 
Selenium webdriver 如何在PageObjectModel模式中创建用于检查网页上是否存在webElement的实用方法
selenium-webdriver 
Selenium webdriver 我能'；没有得到正确的元素
selenium-webdriverjasmineprotractor 
Selenium webdriver 如何在不使用源代码的情况下为vs代码扩展编写UI自动化测试
selenium-webdrivervisual-studio-code 
Selenium webdriver 如何使用selenium和java修复此NullPointerException？
selenium-webdriver 
Selenium webdriver 为什么TestNG可选参数带有双引号
selenium-webdrivertestng 
Selenium webdriver 在不使用select类的情况下，我希望确保；山姆；选项在下拉列表中不可用
selenium-webdriverautomationautomated-tests 
Selenium webdriver 自动打开的边缘实例太多，带有“；错误：file_io_win.cc（180）]CreateFile settings.dat:访问被拒绝。（0x5）和#x201D；
selenium-webdriver 
Selenium webdriver 有没有一种方法可以使用Selenium Java自动化语音通话
selenium-webdriver


                                        

                                        
                                        


                                                
                                                        [python]相关推荐
                                                        
Python 皮加梅射击
									Python
							 
Python 正则表达式匹配字符串，然后匹配下两行
									Python
							 									Regex
							 
python字母数字排序不提供匹配'；int（str）和#x27；找到重载：以下重载可用
									Python
							 
Python 如何在for循环中将不同大小的项添加到numpy数组中（类似于Matlab的单元格数组）？
									Python
							 									Arrays
							 									Matlab
							 									For Loop
							 									Numpy
							 
如何在Python中将列表元素从第二个链接到最后一个元素
									Python
							 									String
							 									List
							 									Join
							 
Python 如何检查HTML中的某些复选框是否已选中？
									Python
							 									Html
							 									Checkbox
							 									Flask
							 
我可以从Jupyter笔记本使用Dataflow for Python SDK吗？
									Python
							 									Google Cloud Dataflow
							 
Python 熊猫：使用UCS-2 LE编码读取csv文件
									Python
							 									Csv
							 									Pandas
							 
Python 将文件从.txt转换为.csv不会'；t写入最后一列数据
									Python
							 									Csv
							 
Python 如何使我的模型遵循干燥原则
									Python
							 									Django
							 									Python 3.x
							 									Django Models
							 									Django Rest Framework
							 
Python GraphQL cookbook示例中的Django筛选器错误
									Python
							 									Django
							 									Graphql
							 
Python'；s格式不'；无法处理包含JSON的文本
									Python
							 									Python 2.7
							 									Python 3.x
							 
Python 如何使Beautifulsoup不添加<；html>；还是<；？xml？>；
									Python
							 									Html
							 									Xml
							 
Python如何将浮点转换为十六进制到十进制
									Python
							 									Pandas
							 
python中的RSA加密&；在JS中解密
									Python
							 									Encryption
							 
Python Seaborn图-绘图不从（0,0）开始
									Python
							 
Python 从SPSS语法行中提取字符串并转换为日期
									Python
							 
Python 完成h2o操作后移除进度条
									Python
							 									Jupyter Notebook
							 
Windows CMD使用python的sys.stdout在打印行的末尾添加一个随机数
									Python
							 									Cmd
							 
Python 当as_index=False时，groupby.first、groupby.nth和groupby.head之间有什么不同
									Python
							 									Pandas
							 									Dataframe
							 
为什么Python循环慢下来？
									Python
							 									Python 3.x
							 									Performance
							 
Python 根据数据框中列的True或False从数据框中选择行
									Python
							 									Pandas
							 
ssl.SSLError:[ssl:UNSUPPORTED_PROTOCOL]Docker Python中不受支持的协议（_ssl.c:852）：3.6-slim
									Python
							 									Docker
							 
Python 如何在PySpark中创建年、月和日的日期？
									Python
							 									Apache Spark
							 									Pyspark
							 
我试图向代理发送python请求，但失败了
									Python
							 
Python 非类型化全局名称'；总和'；：无法确定一种类型的<；类别'；功能'&燃气轮机；
									Python
							 									Python 3.x
							 									Numpy
							 
Python 对内部级别的多索引列进行操作
									Python
							 									Pandas
							 									Dataframe
							 
Python 为什么所有这些都适用于常规列表而不是2d列表？
									Python
							 									List
							 
Python 如何绕过selenium中的Cloudflare bot保护
									Python
							 									Selenium
							 
Python Book对象不可编辑：尝试基于字段显示实例的类似对象
									Python
							 									Django
							 									Django Models
							 
                                                        
                                                

                                                
                                                        Tags
                                                        
Geolocation
Xaml
Qt4
Opencart
Windows
EmptyTag
Django
Javafx 2
Seo
Sms
Exception
Servlets
Google Bigquery
Google Cloud Dataflow
Data Binding
Wpf
Jmeter
Rss
Mfc
Adobe
Model View Controller
Sas
Batch File
Ruby On Rails 3.2
Antlr4
Dojo
Plsql
C
Internet Explorer 8
Snowflake Cloud Data Platform
Zurb Foundation
Websphere
Clearcase
Karate
Gmail
E Commerce
Sitecore
Sencha Touch 2
Dictionary
Logic
Paypal
Logstash
Fonts
Database
Ubuntu
Sails.js
Nestjs
Asp Classic
Model
Axapta
File Io
Cypress
Arm
Jvm
Text
Autohotkey
Odata
Data Structures
Log4net
Jms
Stored Procedures
Dotnetnuke
Google Compute Engine
Jenkins
Flask
Pagination
Cuda
Autodesk Forge
Magento
Extjs4
Db2
Recursion
Asp.net Core Mvc
Xamarin.forms
Visual Studio Code
.net Core
Google Cloud Storage
Cloud Foundry
Sql Server 2005
Dom
Ipython
Python
Aframe
Jersey
Bots
Intellij Idea
Logging
Webrtc
Log4j
Blazor
Opengl Es
Elixir
Modelica
Combobox
Microservices
Zend Framework2
Migration
Ibm Cloud
Orientdb
Sonarqube
Formatting
Xcode4
Permissions
Protocol Buffers
Testng
Big O
Clang
Drop Down Menu
Entity Framework Core
Cocoa
Command Line
Corda
Lucene
Next.js
Ethereum
Instagram
Rabbitmq
Fluent Nhibernate
C++ Cli
Vue.js
Go
Firefox
Transactions
Scrapy
Exception Handling
Arduino
Angular6
Hash
Xcode
Isabelle
Scala
Devexpress
Service
Couchdb
Clojure
Gatsby
Requirejs
Iis 7
Cygwin
Xquery
Identityserver4
Oracle
Android
Netbeans
Symfony1
Smtp
Phpunit
Discord.js
Flash
Raspberry Pi
Encryption
Parameters
Yocto
Jwt
Time Complexity
Stream
Exchange Server
Openshift
Gitlab
Syntax
Mobile
Tomcat
Apache
Dynamics Crm
Virtualbox
Meteor
Ionic2
Javascript
Unicode
Xampp
Gnuplot
Mapbox
Sparql
Facebook Graph Api
Vhdl
Image
Jquery
Webview
Server
Jsf 2
Layout
Pycharm
Mule
Scikit Learn
Nlp
String
Map
Parallel Processing
Google Api
Utf 8
Rally
Plugins
Gdb
Gridview
Sbt
Macros
Sql Server 2008
Netty
Visual C++
Prestashop
Virtual Machine


                

                        
						
                        
                                
                                        
                                                
                                                        
                                                                Copyright © 2024. All Rights Reserved by  - Fatal编程技术网