在Python中合并三列上的两个数据帧
我有两个数据帧,我想在两列纬度和经度上合并它们。结果df应包括所有列。 df1: 和df2:在Python中合并三列上的两个数据帧,python,pandas,Python,Pandas,我有两个数据帧,我想在两列纬度和经度上合并它们。结果df应包括所有列。 df1: 和df2: Station_Number Date Latitude Longitude Elevation Value 0 CA002100636 2019-01-01 69.5667 -138.9167 1.0 -18.300000 1 CA002100636 2019-01-09 69.5667 -138.9167
Station_Number Date Latitude Longitude Elevation Value
0 CA002100636 2019-01-01 69.5667 -138.9167 1.0 -18.300000
1 CA002100636 2019-01-09 69.5667 -138.9167 1.0 -26.871429
2 CA002100636 2019-01-17 69.5667 -138.9167 1.0 -19.885714
3 CA002100636 2019-01-25 69.5667 -138.9167 1.0 -17.737500
4 CA002100636 2019-02-02 69.5667 -138.9167 1.0 -13.787500
... ... ... ... ... ... ...
我尝试过:
LST_1=pd.merge(df1,df2,how='internal')
但是以这种方式使用merge,我丢失了几个数据点,它们包含在两个数据帧中。我不确定是否要在特定列上进行合并,如果是这样,您需要选择一个标识符重叠的列,例如“Date”列
df\upd=pd.merge(df1,df2,on=“Date”)
打印(df)
日期纬度经度。。。经度y高程值
0 01.01.2019 66.33 17.1 ... -138.9167 1.0 -18.300000
1 09.01.2019 66.33 17.1 ... -138.9167 1.0 -26.871429
2 17.01.2019 66.33 17.1 ... -138.9167 1.0 -19.885714
3 25.01.2019 66.33 17.1 ... -138.9167 1.0 -17.737500
4 02.02.2019 66.33 17.1 ... -138.9167 1.0 -13.787500
[5行x 9列]
INT64索引:5个条目,从0到4
数据列(共9列):
#列非空计数数据类型
--- ------ -------------- -----
0日期5非空对象
1纬度×5非空浮点64
2经度×5非空浮点64
3 LST 5非空对象
4站点号5非空对象
5纬度y 5非空int64
6经度y 5非空int64
7立面图5非空浮点数64
8值5非空对象
数据类型:float64(3)、int64(2)、object(4)
内存使用:400.0+字节
由于列名相同,熊猫将在纬度和经度上创建x和y
如果希望一行中的所有列和数据独立于其他列,则可以使用pd.concat。但是,由于缺少数据,这将创建一些NaN值
df_1 = pd.concat([df1, df2])
print(df_1)
Date Latitude Longitude ... Station_Number Elevation Value
0 01.01.2019 66.33 17.1 ... NaN NaN NaN
1 09.01.2019 66.33 17.1 ... NaN NaN NaN
2 17.01.2019 66.33 17.1 ... NaN NaN NaN
3 25.01.2019 66.33 17.1 ... NaN NaN NaN
4 02.02.2019 66.33 17.1 ... NaN NaN NaN
0 01.01.2019 69.56 -138.9167 ... CA002100636 1.0 -18.300000
1 09.01.2019 69.56 -138.9167 ... CA002100636 1.0 -26.871429
2 17.01.2019 69.56 -138.9167 ... CA002100636 1.0 -19.885714
3 25.01.2019 69.56 -138.9167 ... CA002100636 1.0 -17.737500
4 02.02.2019 69.56 -138.9167 ... CA002100636 1.0 -13.787500
df_1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 4
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 10 non-null object
1 Latitude 10 non-null float64
2 Longitude 10 non-null float64
3 LST 5 non-null object
4 Station_Number 5 non-null object
5 Elevation 5 non-null float64
6 Value 5 non-null object
dtypes: float64(3), object(4)
memory usage: 640.0+ bytes
df_1=pd.concat([df1,df2])
打印(df_1)
日期经纬度。。。桩号高程值
0 01.01.2019 66.33 17.1 ... 楠楠楠楠
1 09.01.2019 66.33 17.1 ... 楠楠楠楠
2 17.01.2019 66.33 17.1 ... 楠楠楠楠
3 25.01.2019 66.33 17.1 ... 楠楠楠楠
4 02.02.2019 66.33 17.1 ... 楠楠楠楠
0 01.01.2019 69.56 -138.9167 ... CA002100636 1.0-18.300000
1 09.01.2019 69.56 -138.9167 ... CA002100636 1.0-26.871429
2 17.01.2019 69.56 -138.9167 ... CA002100636 1.0-19.885714
3 25.01.2019 69.56 -138.9167 ... CA002100636 1.0-17.737500
4 02.02.2019 69.56 -138.9167 ... CA002100636 1.0-13.787500
df_1.info()
INT64索引:10个条目,从0到4
数据列(共7列):
#列非空计数数据类型
--- ------ -------------- -----
0日期10非空对象
1纬度10非空浮点64
2经度10非空浮点64
3 LST 5非空对象
4站点号5非空对象
5立面图5非空浮点数64
6值5非空对象
数据类型:float64(3),object(4)
内存使用:640.0+字节
尝试在merge()
方法中传递on
参数…例如LST_1=pd.merge(df1,df2,how='internal',on=['Date','Longitude','Latitude'])
是的,我已经尝试过了,但由于某些原因,数据仍然在merging过程中丢失。请提供一个最小的可复制示例:。显示您得到的输出,并解释它不正确的原因,还包括预期的输出。我认为合并浮点数是个坏主意。。。您是否尝试将纬度
和经度
乘以10000,四舍五入并转换为int
<代码>(43.577244,7.055041)=>np.圆形(43.577244*10000)。astype(int),np.圆形(7.055041*10000)。astype(int)
=>(43577270550)
。
df_ = pd.merge(df1, df2, on="Date")
print(df_)
Date Latitude_x Longitude_x ... Longitude_y Elevation Value
0 01.01.2019 66.33 17.1 ... -138.9167 1.0 -18.300000
1 09.01.2019 66.33 17.1 ... -138.9167 1.0 -26.871429
2 17.01.2019 66.33 17.1 ... -138.9167 1.0 -19.885714
3 25.01.2019 66.33 17.1 ... -138.9167 1.0 -17.737500
4 02.02.2019 66.33 17.1 ... -138.9167 1.0 -13.787500
[5 rows x 9 columns]
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 5 non-null object
1 Latitude_x 5 non-null float64
2 Longitude_x 5 non-null float64
3 LST 5 non-null object
4 Station_Number 5 non-null object
5 Latitude_y 5 non-null int64
6 Longitude_y 5 non-null int64
7 Elevation 5 non-null float64
8 Value 5 non-null object
dtypes: float64(3), int64(2), object(4)
memory usage: 400.0+ bytes
df_1 = pd.concat([df1, df2])
print(df_1)
Date Latitude Longitude ... Station_Number Elevation Value
0 01.01.2019 66.33 17.1 ... NaN NaN NaN
1 09.01.2019 66.33 17.1 ... NaN NaN NaN
2 17.01.2019 66.33 17.1 ... NaN NaN NaN
3 25.01.2019 66.33 17.1 ... NaN NaN NaN
4 02.02.2019 66.33 17.1 ... NaN NaN NaN
0 01.01.2019 69.56 -138.9167 ... CA002100636 1.0 -18.300000
1 09.01.2019 69.56 -138.9167 ... CA002100636 1.0 -26.871429
2 17.01.2019 69.56 -138.9167 ... CA002100636 1.0 -19.885714
3 25.01.2019 69.56 -138.9167 ... CA002100636 1.0 -17.737500
4 02.02.2019 69.56 -138.9167 ... CA002100636 1.0 -13.787500
df_1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 4
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 10 non-null object
1 Latitude 10 non-null float64
2 Longitude 10 non-null float64
3 LST 5 non-null object
4 Station_Number 5 non-null object
5 Elevation 5 non-null float64
6 Value 5 non-null object
dtypes: float64(3), object(4)
memory usage: 640.0+ bytes