Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Bash-Date操作和连接_Python_Bash_Csv_Join - Fatal编程技术网

Python Bash-Date操作和连接

Python Bash-Date操作和连接,python,bash,csv,join,Python,Bash,Csv,Join,我有两个CSV文件,我想使用日期(CSV 1)和拾取日期时间(CSV 2)合并 CSV 1:Weather.CSV(45KB~365行) CSV 2:最终数据\u 1.CSV(250MB~150万行) 如何操作两个CSV文件中的日期列,并将其合并为一个文件,其中包含Final\u Data\u 1.CSV列在Weather.CSV之前?您肯定不想使用Bash,Python中的一个好方法是使用pandas,类似这样的东西: import pandas as pd df1 = pd.read_csv

我有两个CSV文件,我想使用
日期(CSV 1)和
拾取日期时间(CSV 2)合并

CSV 1:Weather.CSV(45KB~365行)

CSV 2:最终数据\u 1.CSV(250MB~150万行)


如何操作两个CSV文件中的日期列,并将其合并为一个文件,其中包含
Final\u Data\u 1.CSV
列在
Weather.CSV
之前?

您肯定不想使用Bash,Python中的一个好方法是使用pandas,类似这样的东西:

import pandas as pd
df1 = pd.read_csv('weather.csv')
df2 = pd.read_csv('final.csv')
#format the date columns so they match up
df3 = pd.merge(df2,df1, on='date_formatted')

您如何确定哪一行最终数据将合并到哪一行天气数据中?那么,哪一个id将它们链接在一起呢?因为您使用的是Python标记,您是否尝试过Python csv模块?那么,2013-01-01的最终数据将有大约50000行。因此,它需要将weather.csv中的相应值复制到最终的_数据中。
head -3 final_data_1.csv 
medallion,hack_license,vendor_id_x,rate_code,store_and_fwd_flag,pickup_datetime,dropoff_datetime,passenger_count,trip_time_in_secs,trip_distance,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,vendor_id_y,payment_type,fare_amount,surcharge,mta_tax,tip_amount,tolls_amount,total_amount
DFD2202EE08F7A8DC9A57B02ACB81FE2,51EE87E3205C985EF8431D850C786310,CMT,1,N,2013-01-01 23:54:15,2013-01-01 23:58:20,2,244,0.7,-73.974602,40.759945,-73.984734,40.759388,CMT,CSH,5.0,0.5,0.5,0.0,0.0,6.0
237F49C3ECC11F5024B254268F054384,93C363DDF8ED9385D65FAD07CE3F5F07,CMT,1,N,2013-01-01 07:35:47,2013-01-01 07:46:00,1,612,2.3,-73.98850999999999,40.774307,-73.981094,40.755325,CMT,CSH,10.0,0.0,0.5,0.0,0.0,10.5
import pandas as pd
df1 = pd.read_csv('weather.csv')
df2 = pd.read_csv('final.csv')
#format the date columns so they match up
df3 = pd.merge(df2,df1, on='date_formatted')