Python 如何解决ValueError:合并两个数据帧时必须对左键进行排序?
我正在尝试合并两个Pandas数据帧,一个称为SF1的数据帧包含季度数据,另一个称为DAILY的数据帧包含每日数据 每日数据帧:Python 如何解决ValueError:合并两个数据帧时必须对左键进行排序?,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我正在尝试合并两个Pandas数据帧,一个称为SF1的数据帧包含季度数据,另一个称为DAILY的数据帧包含每日数据 每日数据帧: ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps A,2020-09-14,2020-09-14,31617.1,36.3,26.8,30652.1,6.2,44.4,5.9 ticker,dimension,calendardate,datekey,reportperiod,lastupda
ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps
A,2020-09-14,2020-09-14,31617.1,36.3,26.8,30652.1,6.2,44.4,5.9
ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,2020-09-14,2020-09-14,2020-09-14,2020-09-14,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000
SF1数据帧:
ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps
A,2020-09-14,2020-09-14,31617.1,36.3,26.8,30652.1,6.2,44.4,5.9
ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,2020-09-14,2020-09-14,2020-09-14,2020-09-14,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000
我已经按日期和股票代码对数据帧进行了排序
数据排序代码:
daily['date'] = pd.to_datetime((daily['date']))
sf1['calendardate'] = pd.to_datetime(sf1['calendardate'])
daily = daily.sort_values(['ticker', 'date'])
sf1 = sf1.sort_values(['ticker', 'calendardate'])
每日分类:
,ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps
180766,AAPL,2007-05-30,2020-08-31,95640.1,24.1,22.6,102735.1,8.4,36.8,4.8
180716,AAPL,2007-05-31,2020-08-31,97722.9,24.7,23.1,104817.9,8.5,37.6,4.9
分类SF1:
ticker,calendardate,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pe1,ppnenet,prefdivis,price,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
0,AAPL,2007-06-30,56000000.0,21647000000.0,,18745000000.0,2902000000.0,,0.552,-283000000.0,7118000000.0,7118000000.0,3415000000.0,818000000.0,2.681,0.615,0.0,0.0,0.0,0.0,0.0,81000000.0,0.0,0.0,0.0,1196000000.0,1277000000.0,0.236,1277000000.0,1196000000.0,1196000000.0,0.034,0.033,0.034,13404000000.0,,13404000000.0,944000000.0,0.039,1.0,1995000000.0,0.369,275000000.0,0.0,7262000000.0,,251000000.0,6649000000.0,6649000000.0,0.0,8243000000.0,6992000000.0,1251000000.0,23000000.0,-6000000.0,118000000.0,0.0,0.0,229000000.0,-1433000000.0,-1170000000.0,1227000000.0,0.0,818000000.0,818000000.0,818000000.0,0.0,0.0,0.151,954000000.0,1041000000.0,3660000000.0,0.0,36.815,1626000000.0,0.0,4.7860000000000005,5.134,1410000000.0,8199000000.0,5410000000.0,5410000000.0,208000000.0,,,,,65000000.0,746000000.0,1.0,24349946740.0,24270568000.0,24938788000.0,0.223,21372000000.0,687000000.0,378000000.0,0.0,0.8809999999999999,11753000000.0
48,AAPL,2007-06-30,56000000.0,21647000000.0,19256000000.0,18745000000.0,2902000000.0,1.175,0.552,-675000000.0,7118000000.0,7118000000.0,15150000000.0,3134000000.0,2.681,0.615,0.0,0.0,0.0,0.0,0.0,290000000.0,0.0,0.0,0.0,4499000000.0,4789000000.0,0.212,4789000000.0,4499000000.0,4499000000.0,0.13,0.127,0.13,13404000000.0,11719250000.0,13404000000.0,4154000000.0,0.171,1.0,7476000000.0,0.33,275000000.0,0.0,7262000000.0,5515250000.0,251000000.0,6649000000.0,6649000000.0,0.0,8243000000.0,6992000000.0,1251000000.0,-895000000.0,-222000000.0,325000000.0,0.0,0.0,650000000.0,-6374000000.0,-5492000000.0,4829000000.0,0.0,3134000000.0,3134000000.0,3134000000.0,0.0,0.0,0.139,3519000000.0,3957000000.0,3660000000.0,0.0,36.815,1626000000.0,0.0,4.7860000000000005,5.134,1410000000.0,8199000000.0,22626000000.0,22626000000.0,754000000.0,0.163,0.267,0.816,0.199,214000000.0,2765000000.0,1.0,24349946740.0,24270568000.0,24938788000.0,0.932,21372000000.0,687000000.0,1365000000.0,0.0,0.8809999999999999,11753000000.0
合并代码:
df = pd.merge_asof(daily, sf1, by='ticker', left_on='date', right_on='calendardate')
错误消息:
ValueError: left keys must be sorted
不知道我为什么会出错。可能是由于数据帧第一个日期开始的时间不一致。不需要按
ticker
排序,因为这是用于精确联接的。此外,将其作为sort\u值
调用中的第一列会阻止对列进行正确排序以进行反向搜索,即date
和calendardate
尝试:
daily=daily.sort_值(['date'])
sf1=sf1.sort_值(['calendardate'])