Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何匹配和合并两个数据帧,这两个数据帧的值除了数据帧列中的数字之外完全不同?_Python_Python 3.x_Pandas_Dataframe_Epoch - Fatal编程技术网

Python 如何匹配和合并两个数据帧,这两个数据帧的值除了数据帧列中的数字之外完全不同?

Python 如何匹配和合并两个数据帧,这两个数据帧的值除了数据帧列中的数字之外完全不同?,python,python-3.x,pandas,dataframe,epoch,Python,Python 3.x,Pandas,Dataframe,Epoch,有一个有价值的数据帧ABC id | price | type 0 easdca | Rs.1,599.00 was trasn by you | unknown 1 vbbngy | txn of INR 191.00 using | unknown 2 awerfa | Rs.190.78 credits was used

有一个有价值的数据帧ABC

      id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you       | unknown
1     vbbngy     | txn of INR 191.00 using            | unknown
2     awerfa     | Rs.190.78 credits was used by you  | unknown
3     zxcmo5     | DLR.2000 credits was used by you   | unknown
和其他XYZ值

         price          |   type
0      190.78           | food
1      191.00           | movie
2      2,000            | football
3      1,599.00         | basketball
如何将XYZ映射到ABC,以便使用XYZ价格中的值(数值)使用XYZ中的type in XYZ更新ABC中的type in

我需要的输出

       id         |     price                          |   type
0     easdca     | Rs.1,599.00 was trasn by you        | basketball
1     vbbngy     | txn of INR 191.00 using             | movie
2     awerfa     | Rs.190.78 credits was used by you   | food
3     zxcmo5     | DLR.2,000 credits was used by you| football
用这个

d = dict(zip(XYZ['PRICE'],XYZ['TYPE']))

pat = (r'({})'.format('|'.join(d.keys())))

ABC['TYPE']=ABC['PRICE'].str.extract(pat,expand=False).map(d)
但是像190.78和191.00这样的值正在变得不匹配。
例如,在处理海量数据时,190.78应该与食物值相匹配,比如190.77与食物不匹配,而食物有其他分配给它的值。198.78也与其他一些应该与食物搭配的不匹配

你可以做以下几点:

'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''

df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)

# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')
输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football
                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football

我假设您在
xyz
表中输入了一个错误,第三个价格应该是
2000.78
,而不是
2000
,您可以执行以下操作:

'''
First we make a artificial key column to be able to merge
We basically just substract the floating numbers from the string
And convert it to type float
'''

df1['price_key'] = df1['price'].str.replace(',', '').str.extract('(\d+\.\d+)').astype(float)

# After that we do a merge on price and price_key and drop the columns which we dont need
df_final = pd.merge(df1, df2, left_on='price_key', right_on='price', suffixes=['', '_2'])
df_final = df_final.drop(['type', 'price_key', 'price_2'], axis='columns')
输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football
                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football
我假设您在
xyz
表中输入了一个错误,第三个价格应该是
2000.78
,而不是
2000

        id                price                                type
0       easdca        Rs.1,599.00 was trasn by you          unknown
1       vbbngy        txn of INR 191.00 using               unknown
2       awerfa        Rs.190.78 credits was used by you     unknown
3       zxcmo5        DLR.2000 credits was used by you      unknown
df2

使用
re

df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')
使用
pd.merge

df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)
输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football
                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football
df

df2

使用
re

df['price_'] = df['price'].apply(lambda x: re.findall(r'(?<=[\.\s])[\d\.]+',x.replace(',',''))[0])
df2.columns = ['price_','type']
df2['price_'] = df2['price_'].str.repalce(',','')
使用
pd.merge

df = df.merge(df2, on='price_')
df.drop('type_x', axis=1)
输出

    id      price                               type_2
0   easdca  Rs.1,599.00 was trasn by you        basketball
1   vbbngy  txn of INR 191.00 using             movie
2   awerfa  Rs.190.78 credits was used by you   food
3   zxcmo5  DLR.2000.78 credits was used by you football
                id                                 price   price_       type_y
0      easdca        Rs.1,599.00 was trasn by you         1599.00   basketball
1      vbbngy        txn of INR 191.00 using               191.00        movie
2      awerfa        Rs.190.78 credits was used by you     190.78         food
3      zxcmo5        DLR.2000 credits was used by you        2000     football


超级,那么你能用这个数据添加有上升误差的解决方案吗?那么它会上升误差吗?190.78应该与食物匹配,而使用巨大的数据值,比如190.77与食物不匹配,它有其他值分配给它的超级,那么,你能用这个数据添加带有上升错误的解决方案吗?那么它会上升错误吗?190.78应该与食物匹配,同时使用巨大的数据值,如190.77与食物不匹配,其中它有其他值分配给itValueError:你试图在float64和object列上合并。如果您希望继续,您应该使用pd.concatRead读取错误,它会逐字说明问题所在。。您已将两个数据帧的列转换为浮点型:
df['Column']=df.Column.astype(float)。
price\u键
已为浮点型,因此,您必须将
xyz
表的
price
列也更改为float。ValueError:无法将字符串转换为float:替换字符串中的逗号:
df['column']=df['column'].str.Replace(',','').astype(float)
Great move使
price\u键成为float,并将其用于合并,+1。但是,对于非常浮动的值,我不会将浮动的精度作为合并的键。ValueError:您正在尝试合并浮动64和对象列。如果您希望继续,您应该使用pd.concatRead读取错误,它会逐字说明问题所在。。您已将两个数据帧的列转换为浮点型:
df['Column']=df.Column.astype(float)。
price\u键
已为浮点型,因此,您必须将
xyz
表的
price
列也更改为float。ValueError:无法将字符串转换为float:替换字符串中的逗号:
df['column']=df['column'].str.Replace(',','').astype(float)
Great move使
price\u键成为float,并将其用于合并,+1。尽管如此,对于非常浮动的值,我不会依赖浮动的精度作为合并的关键点。正在工作,但只得到两个输出,而不是全部。我想不出为什么它对我有用。。我建议你检查一下价格栏问题是。在df的价格列中,我们有1599.00的价格,但在df2中,我们有相同的值要与1599匹配,因此内部合并不起作用,因为.00对于2120.23但不是1599的值,它起作用很好。为什么不将价格转换为浮动..然后进行合并当我创建df时,价格列的类型是str,现在我要让它们漂浮起来。。看看这是否有助于工作,但只获得两个输出,而不是全部。我想不出为什么它对我有用。。我建议你检查一下价格栏问题是。在df的价格列中,我们有1599.00的价格,但在df2中,我们有相同的值要与1599匹配,因此内部合并不起作用,因为.00对于2120.23但不是1599的值,它起作用很好。为什么不将价格转换为浮动..然后进行合并当我创建df时,价格列的类型是str,现在我要让它们漂浮起来。。看看这是否有帮助