Python 如何匹配和合并两个数据帧，这两个数据帧的值除了数据帧列中的数字之外完全不同？_Python_Python 3.x_Pandas_Dataframe_Epoch

Python 如何匹配和合并两个数据帧，这两个数据帧的值除了数据帧列中的数字之外完全不同？

python python-3.x pandas dataframe

Python 如何匹配和合并两个数据帧，这两个数据帧的值除了数据帧列中的数字之外完全不同？,python,python-3.x,pandas,dataframe,epoch,Python,Python 3.x,Pandas,Dataframe,Epoch,有一个有价值的数据帧ABC id | price | type 0 easdca | Rs.1,599.00 was trasn by you | unknown 1 vbbngy | txn of INR 191.00 using | unknown 2 awerfa | Rs.190.78 credits was used

有一个有价值的数据帧ABC

      id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you       | unknown
1     vbbngy     | txn of INR 191.00 using            | unknown
2     awerfa     | Rs.190.78 credits was used by you  | unknown
3     zxcmo5     | DLR.2000 credits was used by you   | unknown

和其他XYZ值

         price          |   type
0      190.78           | food
1      191.00           | movie
2      2,000            | football
3      1,599.00         | basketball

如何将XYZ映射到ABC，以便使用XYZ价格中的值（数值）使用XYZ中的type in XYZ更新ABC中的type in

我需要的输出

       id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you        | basketball
1     vbbngy     | txn of INR 191.00 using             | movie
2     awerfa     | Rs.190.78 credits was used by you   | food
3     zxcmo5     | DLR.2,000 credits was used by you| football

用这个

d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))

pat = (r'({})'.format('|'.join(d.keys())))

ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)

但是像190.78和191.00这样的值正在变得不匹配。

例如，在处理海量数据时，190.78应该与食物值相匹配，比如190.77与食物不匹配，而食物有其他分配给它的值。198.78也与其他一些应该与食物搭配的不匹配

你可以做以下几点：

'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''

df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)

# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')

输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football

                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football

我假设您在

xyz

表中输入了一个错误，第三个价格应该是

2000.78

，而不是

，您可以执行以下操作：

'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''

df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)

# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')

输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football

                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football

我假设您在

xyz

表中输入了一个错误，第三个价格应该是

2000.78

，而不是

        id                price                                type
0       easdca        Rs.1,599.00 was trasn by you          unknown
1       vbbngy        txn of INR 191.00 using               unknown
2       awerfa        Rs.190.78 credits was used by you     unknown
3       zxcmo5        DLR.2000 credits was used by you      unknown

df2

使用

re

df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')

使用

pd.merge

df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)

输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football

                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football

df2

使用

re

df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')

使用

pd.merge

df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)

输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football

                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football

超级，那么你能用这个数据添加有上升误差的解决方案吗？那么它会上升误差吗？190.78应该与食物匹配，而使用巨大的数据值，比如190.77与食物不匹配，它有其他值分配给它的超级，那么，你能用这个数据添加带有上升错误的解决方案吗？那么它会上升错误吗？190.78应该与食物匹配，同时使用巨大的数据值，如190.77与食物不匹配，其中它有其他值分配给itValueError：你试图在float64和object列上合并。如果您希望继续，您应该使用pd.concatRead读取错误，它会逐字说明问题所在。。您已将两个数据帧的列转换为浮点型：

df['Column']=df.Column.astype（float）。

列

price\u键

已为浮点型，因此，您必须将

xyz

表的

price

列也更改为float。ValueError:无法将字符串转换为float：替换字符串中的逗号：

df['column']=df['column'].str.Replace（'，'，''）.astype（float）

Great move使

price\u键成为float，并将其用于合并，+1。但是，对于非常浮动的值，我不会将浮动的精度作为合并的键。ValueError：您正在尝试合并浮动64和对象列。如果您希望继续，您应该使用pd.concatRead读取错误，它会逐字说明问题所在。。您已将两个数据帧的列转换为浮点型：df['Column']=df.Column.astype（float）。
列price\u键
已为浮点型，因此，您必须将xyz
表的price
列也更改为float。ValueError:无法将字符串转换为float：替换字符串中的逗号：df['column']=df['column'].str.Replace（'，'，''）.astype（float）
Great move使price\u键成为float，并将其用于合并，+1。尽管如此，对于非常浮动的值，我不会依赖浮动的精度作为合并的关键点。正在工作，但只得到两个输出，而不是全部。我想不出为什么它对我有用。。我建议你检查一下价格栏问题是。在df的价格列中，我们有1599.00的价格，但在df2中，我们有相同的值要与1599匹配，因此内部合并不起作用，因为.00对于2120.23但不是1599的值，它起作用很好。为什么不将价格转换为浮动..然后进行合并当我创建df时，价格列的类型是str，现在我要让它们漂浮起来。。看看这是否有助于工作，但只获得两个输出，而不是全部。我想不出为什么它对我有用。。我建议你检查一下价格栏问题是。在df的价格列中，我们有1599.00的价格，但在df2中，我们有相同的值要与1599匹配，因此内部合并不起作用，因为.00对于2120.23但不是1599的值，它起作用很好。为什么不将价格转换为浮动..然后进行合并当我创建df时，价格列的类型是str，现在我要让它们漂浮起来。。看看这是否有帮助