Python 如何基于字典通过共享列合并两个数据帧?
所以我有两个数据帧,我正试图通过一列连接。但是,这一公共列在两个数据帧之间是不同的,其中一个数据帧中的值是全名,而另一个数据帧中的值是缩写。我正在尝试制作某种映射字典,将名称等同于它们的缩写,这样数据帧就可以通过这个公共列直接连接起来。我将在下面进一步解释 我有这个数据帧(A): 然后我得到了这个数据帧(B): 现在我想创建以下数据帧(C): 我会进一步解释。我有建筑物的数据,显示了3种类型的建筑物对应的3种类型的二氧化碳和氮氧化物排放量(数据框A)。然后,我有数据显示,在每个列出的邮政编码(数据框B)中,这些类型的房屋有多少。我最终想要创建一个数据框,显示每个邮政编码的二氧化碳和氮氧化物的总排放量(数据框C)。获取数据框C背后的想法是,我想生成一个数据框,然后通过邮政编码将其连接到GIS形状文件,这样我就可以映射每个邮政编码中的CO2和NOx排放量,并获得一个要连接的空白zipcodes形状文件。(我意识到绘制排放量图要比这复杂得多,但在我的项目的这一阶段,我保持这一简单) 所以我要做的是在“Building”列中将数据帧A连接到数据帧B。但问题是,数据框A中的“Building”列具有完整的名称,而数据框B中的“Building”列具有作为缩写的名称。我在想,我需要创建某种字典,将全名与要引用的缩写匹配,但我不确定如何将其放在这里 这可以用python实现吗?或者这真的比我想象的要复杂吗?我花了好几个小时试图思考如何合并这两个数据帧,但每次我都会更加困惑。我在概念化这段代码时遇到了很多困难,即使目标看起来很简单。我将非常感谢您的帮助和指导!抱歉让这些数据帧太长,但我觉得有必要捕获数据的结构/复杂性Python 如何基于字典通过共享列合并两个数据帧?,python,dataframe,dictionary,join,merge,Python,Dataframe,Dictionary,Join,Merge,所以我有两个数据帧,我正试图通过一列连接。但是,这一公共列在两个数据帧之间是不同的,其中一个数据帧中的值是全名,而另一个数据帧中的值是缩写。我正在尝试制作某种映射字典,将名称等同于它们的缩写,这样数据帧就可以通过这个公共列直接连接起来。我将在下面进一步解释 我有这个数据帧(A): 然后我得到了这个数据帧(B): 现在我想创建以下数据帧(C): 我会进一步解释。我有建筑物的数据,显示了3种类型的建筑物对应的3种类型的二氧化碳和氮氧化物排放量(数据框A)。然后,我有数据显示,在每个列出的邮政编码(数
谢谢大家! 这当然是可能的。您确实可以首先使用自定义函数将建筑物名称转换为缩写,该函数在连字符上拆分,并取每个单词的第一个字母。然后可以合并
建筑
和样式
上的数据帧。最后,您可以在Zip code
上查看groupby
:
import pandas as pd
data1 = [ { "Building": "Multi-Family", "Style": "A", "CO2": 34, "NOx": 55 }, { "Building": "Multi-Family", "Style": "B", "CO2": 43, "NOx": 44 }, { "Building": "Multi-Family", "Style": "C", "CO2": 33, "NOx": 35 }, { "Building": "Single-Family", "Style": "A", "CO2": 34, "NOx": 26 }, { "Building": "Single-Family", "Style": "B", "CO2": 22, "NOx": 26 }, { "Building": "Single-Family", "Style": "C", "CO2": 65, "NOx": 48 }, { "Building": "Single-Family", "Style": "D", "CO2": 55, "NOx": 74 }, { "Building": "Studio", "Style": "A", "CO2": 46, "NOx": 35 }, { "Building": "Studio", "Style": "B", "CO2": 54, "NOx": 67 }, { "Building": "Studio", "Style": "C", "CO2": 57, "NOx": 58 } ]
data2 = [ { "Building": "MF", "Style": "A", "Zip_code": 11111, "Number": 4 }, { "Building": "MF", "Style": "A", "Zip_code": 22222, "Number": 3 }, { "Building": "MF", "Style": "A", "Zip_code": 33333, "Number": 2 }, { "Building": "MF", "Style": "B", "Zip_code": 11111, "Number": 1 }, { "Building": "MF", "Style": "B", "Zip_code": 22222, "Number": 1 }, { "Building": "MF", "Style": "C", "Zip_code": 22222, "Number": 1 }, { "Building": "MF", "Style": "C", "Zip_code": 33333, "Number": 6 }, { "Building": "SF", "Style": "A", "Zip_code": 11111, "Number": 7 }, { "Building": "SF", "Style": "A", "Zip_code": 22222, "Number": 5 }, { "Building": "SF", "Style": "B", "Zip_code": 44444, "Number": 3 }, { "Building": "SF", "Style": "B", "Zip_code": 55555, "Number": 8 }, { "Building": "SF", "Style": "B", "Zip_code": 66666, "Number": 6 }, { "Building": "SF", "Style": "C", "Zip_code": 11111, "Number": 9 }, { "Building": "SF", "Style": "C", "Zip_code": 22222, "Number": 9 }, { "Building": "ST", "Style": "A", "Zip_code": 33333, "Number": 3 }, { "Building": "ST", "Style": "A", "Zip_code": 44444, "Number": 5 }, { "Building": "ST", "Style": "B", "Zip_code": 55555, "Number": 5 }, { "Building": "ST", "Style": "B", "Zip_code": 66666, "Number": 3 }, { "Building": "ST", "Style": "C", "Zip_code": 11111, "Number": 2 }, { "Building": "ST", "Style": "C", "Zip_code": 22222, "Number": 9 }, { "Building": "ST", "Style": "C", "Zip_code": 33333, "Number": 1 } ]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
def shorten_buildings(text):
text = ''.join([i[0] for i in text.split('-')])
return text if text != 'S' else 'ST'
df1['Building'] = df1['Building'].apply(shorten_buildings)
df2 = df2.merge(df1, how='left', on=['Building','Style'])
df2['CO2'] = df2['Number'] * df2['CO2'] #total CO2
df2['NOx'] = df2['Number'] * df2['NOx'] #total NOx
df2.groupby(['Zip_code']).sum().drop('Number', axis=1)
输出:
邮政编码
二氧化碳
氮氧化物
11111
1116
994
22222
1446
1328
33333
461
483
44444
296
253
55555
446
543
66666
294
357
Building Style Zip_code Number
---------------------------------------
MF A 11111 4
MF A 22222 3
MF A 33333 2
MF B 11111 1
MF B 22222 1
MF C 22222 1
MF C 33333 6
SF A 11111 7
SF A 22222 5
SF B 44444 3
SF B 55555 8
SF B 66666 6
SF C 11111 9
SF C 22222 9
ST A 33333 3
ST A 44444 5
ST B 55555 5
ST B 66666 3
ST C 11111 2
ST C 22222 9
ST C 33333 1
Zip_code CO2 NOx
-------------------------
11111 ? ?
22222 ? ?
33333 ? ?
44444 ? ?
55555 ? ?
66666 ? ?
import pandas as pd
data1 = [ { "Building": "Multi-Family", "Style": "A", "CO2": 34, "NOx": 55 }, { "Building": "Multi-Family", "Style": "B", "CO2": 43, "NOx": 44 }, { "Building": "Multi-Family", "Style": "C", "CO2": 33, "NOx": 35 }, { "Building": "Single-Family", "Style": "A", "CO2": 34, "NOx": 26 }, { "Building": "Single-Family", "Style": "B", "CO2": 22, "NOx": 26 }, { "Building": "Single-Family", "Style": "C", "CO2": 65, "NOx": 48 }, { "Building": "Single-Family", "Style": "D", "CO2": 55, "NOx": 74 }, { "Building": "Studio", "Style": "A", "CO2": 46, "NOx": 35 }, { "Building": "Studio", "Style": "B", "CO2": 54, "NOx": 67 }, { "Building": "Studio", "Style": "C", "CO2": 57, "NOx": 58 } ]
data2 = [ { "Building": "MF", "Style": "A", "Zip_code": 11111, "Number": 4 }, { "Building": "MF", "Style": "A", "Zip_code": 22222, "Number": 3 }, { "Building": "MF", "Style": "A", "Zip_code": 33333, "Number": 2 }, { "Building": "MF", "Style": "B", "Zip_code": 11111, "Number": 1 }, { "Building": "MF", "Style": "B", "Zip_code": 22222, "Number": 1 }, { "Building": "MF", "Style": "C", "Zip_code": 22222, "Number": 1 }, { "Building": "MF", "Style": "C", "Zip_code": 33333, "Number": 6 }, { "Building": "SF", "Style": "A", "Zip_code": 11111, "Number": 7 }, { "Building": "SF", "Style": "A", "Zip_code": 22222, "Number": 5 }, { "Building": "SF", "Style": "B", "Zip_code": 44444, "Number": 3 }, { "Building": "SF", "Style": "B", "Zip_code": 55555, "Number": 8 }, { "Building": "SF", "Style": "B", "Zip_code": 66666, "Number": 6 }, { "Building": "SF", "Style": "C", "Zip_code": 11111, "Number": 9 }, { "Building": "SF", "Style": "C", "Zip_code": 22222, "Number": 9 }, { "Building": "ST", "Style": "A", "Zip_code": 33333, "Number": 3 }, { "Building": "ST", "Style": "A", "Zip_code": 44444, "Number": 5 }, { "Building": "ST", "Style": "B", "Zip_code": 55555, "Number": 5 }, { "Building": "ST", "Style": "B", "Zip_code": 66666, "Number": 3 }, { "Building": "ST", "Style": "C", "Zip_code": 11111, "Number": 2 }, { "Building": "ST", "Style": "C", "Zip_code": 22222, "Number": 9 }, { "Building": "ST", "Style": "C", "Zip_code": 33333, "Number": 1 } ]
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
def shorten_buildings(text):
text = ''.join([i[0] for i in text.split('-')])
return text if text != 'S' else 'ST'
df1['Building'] = df1['Building'].apply(shorten_buildings)
df2 = df2.merge(df1, how='left', on=['Building','Style'])
df2['CO2'] = df2['Number'] * df2['CO2'] #total CO2
df2['NOx'] = df2['Number'] * df2['NOx'] #total NOx
df2.groupby(['Zip_code']).sum().drop('Number', axis=1)