在Python中合并三列上的两个数据帧

在Python中合并三列上的两个数据帧,python,pandas,Python,Pandas,我有两个数据帧,我想在两列纬度和经度上合并它们。结果df应包括所有列。 df1: 和df2: Station_Number Date Latitude Longitude Elevation Value 0 CA002100636 2019-01-01 69.5667 -138.9167 1.0 -18.300000 1 CA002100636 2019-01-09 69.5667 -138.9167

我有两个数据帧,我想在两列纬度和经度上合并它们。结果df应包括所有列。 df1:

和df2:

     Station_Number       Date  Latitude  Longitude  Elevation      Value
0       CA002100636 2019-01-01   69.5667  -138.9167        1.0 -18.300000
1       CA002100636 2019-01-09   69.5667  -138.9167        1.0 -26.871429
2       CA002100636 2019-01-17   69.5667  -138.9167        1.0 -19.885714
3       CA002100636 2019-01-25   69.5667  -138.9167        1.0 -17.737500
4       CA002100636 2019-02-02   69.5667  -138.9167        1.0 -13.787500
...             ...        ...       ...        ...        ...        ...

我尝试过:
LST_1=pd.merge(df1,df2,how='internal')
但是以这种方式使用merge,我丢失了几个数据点,它们包含在两个数据帧中。

我不确定是否要在特定列上进行合并,如果是这样,您需要选择一个标识符重叠的列,例如“Date”列

df\upd=pd.merge(df1,df2,on=“Date”)
打印(df)
日期纬度经度。。。经度y高程值
0  01.01.2019       66.33         17.1  ...    -138.9167       1.0  -18.300000
1  09.01.2019       66.33         17.1  ...    -138.9167       1.0  -26.871429
2  17.01.2019       66.33         17.1  ...    -138.9167       1.0  -19.885714
3  25.01.2019       66.33         17.1  ...    -138.9167       1.0  -17.737500
4  02.02.2019       66.33         17.1  ...    -138.9167       1.0  -13.787500
[5行x 9列]
INT64索引:5个条目,从0到4
数据列(共9列):
#列非空计数数据类型
---  ------          --------------  -----  
0日期5非空对象
1纬度×5非空浮点64
2经度×5非空浮点64
3 LST 5非空对象
4站点号5非空对象
5纬度y 5非空int64
6经度y 5非空int64
7立面图5非空浮点数64
8值5非空对象
数据类型:float64(3)、int64(2)、object(4)
内存使用:400.0+字节
由于列名相同,熊猫将在纬度和经度上创建x和y

如果希望一行中的所有列和数据独立于其他列,则可以使用pd.concat。但是,由于缺少数据,这将创建一些NaN值

df_1 = pd.concat([df1, df2])
print(df_1)
         Date  Latitude  Longitude  ... Station_Number Elevation        Value
0  01.01.2019     66.33       17.1  ...            NaN       NaN          NaN
1  09.01.2019     66.33       17.1  ...            NaN       NaN          NaN
2  17.01.2019     66.33       17.1  ...            NaN       NaN          NaN
3  25.01.2019     66.33       17.1  ...            NaN       NaN          NaN
4  02.02.2019     66.33       17.1  ...            NaN       NaN          NaN
0  01.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -18.300000
1  09.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -26.871429
2  17.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -19.885714
3  25.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -17.737500
4  02.02.2019     69.56  -138.9167  ...    CA002100636       1.0   -13.787500

df_1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 4
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            10 non-null     object 
 1   Latitude        10 non-null     float64
 2   Longitude       10 non-null     float64
 3   LST             5 non-null      object 
 4   Station_Number  5 non-null      object 
 5   Elevation       5 non-null      float64
 6   Value           5 non-null      object 
dtypes: float64(3), object(4)
memory usage: 640.0+ bytes
df_1=pd.concat([df1,df2])
打印(df_1)
日期经纬度。。。桩号高程值
0  01.01.2019     66.33       17.1  ...            楠楠楠楠
1  09.01.2019     66.33       17.1  ...            楠楠楠楠
2  17.01.2019     66.33       17.1  ...            楠楠楠楠
3  25.01.2019     66.33       17.1  ...            楠楠楠楠
4  02.02.2019     66.33       17.1  ...            楠楠楠楠
0  01.01.2019     69.56  -138.9167  ...    CA002100636 1.0-18.300000
1  09.01.2019     69.56  -138.9167  ...    CA002100636 1.0-26.871429
2  17.01.2019     69.56  -138.9167  ...    CA002100636 1.0-19.885714
3  25.01.2019     69.56  -138.9167  ...    CA002100636 1.0-17.737500
4  02.02.2019     69.56  -138.9167  ...    CA002100636 1.0-13.787500
df_1.info()
INT64索引:10个条目,从0到4
数据列(共7列):
#列非空计数数据类型
---  ------          --------------  -----  
0日期10非空对象
1纬度10非空浮点64
2经度10非空浮点64
3 LST 5非空对象
4站点号5非空对象
5立面图5非空浮点数64
6值5非空对象
数据类型:float64(3),object(4)
内存使用:640.0+字节

尝试在
merge()
方法中传递
on
参数…例如
LST_1=pd.merge(df1,df2,how='internal',on=['Date','Longitude','Latitude'])
是的,我已经尝试过了,但由于某些原因,数据仍然在merging过程中丢失。请提供一个最小的可复制示例:。显示您得到的输出,并解释它不正确的原因,还包括预期的输出。我认为合并浮点数是个坏主意。。。您是否尝试将
纬度
经度
乘以10000,四舍五入并转换为
int
<代码>(43.577244,7.055041)=>
np.圆形(43.577244*10000)。astype(int),np.圆形(7.055041*10000)。astype(int)
=>
(43577270550)
df_ = pd.merge(df1, df2, on="Date")
print(df_)
     Date  Latitude_x  Longitude_x  ... Longitude_y Elevation        Value
0  01.01.2019       66.33         17.1  ...    -138.9167       1.0  -18.300000
1  09.01.2019       66.33         17.1  ...    -138.9167       1.0  -26.871429
2  17.01.2019       66.33         17.1  ...    -138.9167       1.0  -19.885714
3  25.01.2019       66.33         17.1  ...    -138.9167       1.0  -17.737500
4  02.02.2019       66.33         17.1  ...    -138.9167       1.0  -13.787500

[5 rows x 9 columns]

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            5 non-null      object 
 1   Latitude_x      5 non-null      float64
 2   Longitude_x     5 non-null      float64
 3   LST             5 non-null      object 
 4   Station_Number  5 non-null      object 
 5   Latitude_y      5 non-null      int64  
 6   Longitude_y     5 non-null      int64  
 7   Elevation       5 non-null      float64
 8   Value           5 non-null      object 

dtypes: float64(3), int64(2), object(4)
memory usage: 400.0+ bytes
df_1 = pd.concat([df1, df2])
print(df_1)
         Date  Latitude  Longitude  ... Station_Number Elevation        Value
0  01.01.2019     66.33       17.1  ...            NaN       NaN          NaN
1  09.01.2019     66.33       17.1  ...            NaN       NaN          NaN
2  17.01.2019     66.33       17.1  ...            NaN       NaN          NaN
3  25.01.2019     66.33       17.1  ...            NaN       NaN          NaN
4  02.02.2019     66.33       17.1  ...            NaN       NaN          NaN
0  01.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -18.300000
1  09.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -26.871429
2  17.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -19.885714
3  25.01.2019     69.56  -138.9167  ...    CA002100636       1.0   -17.737500
4  02.02.2019     69.56  -138.9167  ...    CA002100636       1.0   -13.787500

df_1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 4
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            10 non-null     object 
 1   Latitude        10 non-null     float64
 2   Longitude       10 non-null     float64
 3   LST             5 non-null      object 
 4   Station_Number  5 non-null      object 
 5   Elevation       5 non-null      float64
 6   Value           5 non-null      object 
dtypes: float64(3), object(4)
memory usage: 640.0+ bytes