Python 如何匹配和合并两个数据帧,这两个数据帧的值除了数据帧列中的数字之外完全不同?
有一个有价值的数据帧ABCPython 如何匹配和合并两个数据帧,这两个数据帧的值除了数据帧列中的数字之外完全不同?,python,python-3.x,pandas,dataframe,epoch,Python,Python 3.x,Pandas,Dataframe,Epoch,有一个有价值的数据帧ABC id | price | type 0 easdca | Rs.1,599.00 was trasn by you | unknown 1 vbbngy | txn of INR 191.00 using | unknown 2 awerfa | Rs.190.78 credits was used
id | price | type
0 easdca | Rs.1,599.00 was trasn by you | unknown
1 vbbngy | txn of INR 191.00 using | unknown
2 awerfa | Rs.190.78 credits was used by you | unknown
3 zxcmo5 | DLR.2000 credits was used by you | unknown
和其他XYZ值
price | type
0 190.78 | food
1 191.00 | movie
2 2,000 | football
3 1,599.00 | basketball
如何将XYZ映射到ABC,以便使用XYZ价格中的值(数值)使用XYZ中的type in XYZ更新ABC中的type in
我需要的输出
id | price | type
0 easdca | Rs.1,599.00 was trasn by you | basketball
1 vbbngy | txn of INR 191.00 using | movie
2 awerfa | Rs.190.78 credits was used by you | food
3 zxcmo5 | DLR.2,000 credits was used by you| football
用这个
d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))
pat = (r'({})'.format('|'.join(d.keys())))
ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)
但是像190.78和191.00这样的值正在变得不匹配。
例如,在处理海量数据时,190.78应该与食物值相匹配,比如190.77与食物不匹配,而食物有其他分配给它的值。198.78也与其他一些应该与食物搭配的不匹配你可以做以下几点:
'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''
df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)
# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')
输出
id price type_2
0 easdca Rs.1,599.00 was trasn by you basketball
1 vbbngy txn of INR 191.00 using movie
2 awerfa Rs.190.78 credits was used by you food
3 zxcmo5 DLR.2000.78 credits was used by you football
id price price_ type_y
0 easdca Rs.1,599.00 was trasn by you 1599.00 basketball
1 vbbngy txn of INR 191.00 using 191.00 movie
2 awerfa Rs.190.78 credits was used by you 190.78 food
3 zxcmo5 DLR.2000 credits was used by you 2000 football
我假设您在
xyz
表中输入了一个错误,第三个价格应该是2000.78
,而不是2000
,您可以执行以下操作:
'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''
df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)
# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')
输出
id price type_2
0 easdca Rs.1,599.00 was trasn by you basketball
1 vbbngy txn of INR 191.00 using movie
2 awerfa Rs.190.78 credits was used by you food
3 zxcmo5 DLR.2000.78 credits was used by you football
id price price_ type_y
0 easdca Rs.1,599.00 was trasn by you 1599.00 basketball
1 vbbngy txn of INR 191.00 using 191.00 movie
2 awerfa Rs.190.78 credits was used by you 190.78 food
3 zxcmo5 DLR.2000 credits was used by you 2000 football
我假设您在xyz
表中输入了一个错误,第三个价格应该是2000.78
,而不是2000
id price type
0 easdca Rs.1,599.00 was trasn by you unknown
1 vbbngy txn of INR 191.00 using unknown
2 awerfa Rs.190.78 credits was used by you unknown
3 zxcmo5 DLR.2000 credits was used by you unknown
df2
使用re
df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')
使用pd.merge
df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)
输出
id price type_2
0 easdca Rs.1,599.00 was trasn by you basketball
1 vbbngy txn of INR 191.00 using movie
2 awerfa Rs.190.78 credits was used by you food
3 zxcmo5 DLR.2000.78 credits was used by you football
id price price_ type_y
0 easdca Rs.1,599.00 was trasn by you 1599.00 basketball
1 vbbngy txn of INR 191.00 using 191.00 movie
2 awerfa Rs.190.78 credits was used by you 190.78 food
3 zxcmo5 DLR.2000 credits was used by you 2000 football
df
df2
使用re
df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')
使用pd.merge
df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)
输出
id price type_2
0 easdca Rs.1,599.00 was trasn by you basketball
1 vbbngy txn of INR 191.00 using movie
2 awerfa Rs.190.78 credits was used by you food
3 zxcmo5 DLR.2000.78 credits was used by you football
id price price_ type_y
0 easdca Rs.1,599.00 was trasn by you 1599.00 basketball
1 vbbngy txn of INR 191.00 using 191.00 movie
2 awerfa Rs.190.78 credits was used by you 190.78 food
3 zxcmo5 DLR.2000 credits was used by you 2000 football
超级,那么你能用这个数据添加有上升误差的解决方案吗?那么它会上升误差吗?190.78应该与食物匹配,而使用巨大的数据值,比如190.77与食物不匹配,它有其他值分配给它的超级,那么,你能用这个数据添加带有上升错误的解决方案吗?那么它会上升错误吗?190.78应该与食物匹配,同时使用巨大的数据值,如190.77与食物不匹配,其中它有其他值分配给itValueError:你试图在float64和object列上合并。如果您希望继续,您应该使用pd.concatRead读取错误,它会逐字说明问题所在。。您已将两个数据帧的列转换为浮点型:
df['Column']=df.Column.astype(float)。
列price\u键
已为浮点型,因此,您必须将xyz
表的price
列也更改为float。ValueError:无法将字符串转换为float:替换字符串中的逗号:df['column']=df['column'].str.Replace(',','').astype(float)
Great move使price\u键成为float,并将其用于合并,+1。但是,对于非常浮动的值,我不会将浮动的精度作为合并的键。ValueError:您正在尝试合并浮动64和对象列。如果您希望继续,您应该使用pd.concatRead读取错误,它会逐字说明问题所在。。您已将两个数据帧的列转换为浮点型:df['Column']=df.Column.astype(float)。
列price\u键
已为浮点型,因此,您必须将xyz
表的price
列也更改为float。ValueError:无法将字符串转换为float:替换字符串中的逗号:df['column']=df['column'].str.Replace(',','').astype(float)
Great move使price\u键成为float,并将其用于合并,+1。尽管如此,对于非常浮动的值,我不会依赖浮动的精度作为合并的关键点。正在工作,但只得到两个输出,而不是全部。我想不出为什么它对我有用。。我建议你检查一下价格栏问题是。在df的价格列中,我们有1599.00的价格,但在df2中,我们有相同的值要与1599匹配,因此内部合并不起作用,因为.00对于2120.23但不是1599的值,它起作用很好。为什么不将价格转换为浮动..然后进行合并当我创建df时,价格列的类型是str,现在我要让它们漂浮起来。。看看这是否有助于工作,但只获得两个输出,而不是全部。我想不出为什么它对我有用。。我建议你检查一下价格栏问题是。在df的价格列中,我们有1599.00的价格,但在df2中,我们有相同的值要与1599匹配,因此内部合并不起作用,因为.00对于2120.23但不是1599的值,它起作用很好。为什么不将价格转换为浮动..然后进行合并当我创建df时,价格列的类型是str,现在我要让它们漂浮起来。。看看这是否有帮助