Python 用新数据填充熊猫DF
这是我关于StackOverflow的第一篇文章。如果我做错了什么或违反了网络规则,我表示歉意。 我的问题是:我在python中使用Python 用新数据填充熊猫DF,python,pandas,Python,Pandas,这是我关于StackOverflow的第一篇文章。如果我做错了什么或违反了网络规则,我表示歉意。 我的问题是:我在python中使用pandas读取了一个csv文件。数据帧有五列,名为[yday,wday,time,stop,N]: yday是一年中的一天,从1到365 wday是一周中的一天,从1到7 time是一个1-144之间的数字(我将一天分成10分钟的间隔,每天1440分钟/10分钟=144) stop是公共汽车站的编号(1-4) N是进入巴士的乘客人数 好吧,我想为每个间隙设置一个条
pandas
读取了一个csv文件。数据帧有五列,名为[yday,wday,time,stop,N]
:
yday
是一年中的一天,从1到365wday
是一周中的一天,从1到7time
是一个1-144之间的数字(我将一天分成10分钟的间隔,每天1440分钟/10分钟=144)stop
是公共汽车站的编号(1-4)N
是进入巴士的乘客人数
好吧,我想为每个间隙设置一个条目,每天给出144行,但我有一些缺失的间隙,如您所见:
我的目标是添加新行以填补所有时间间隔,例如添加(基于给定的图像):
320,6,81,1,1为冗长的代码道歉。但它解决了你的目的:
#Specify the input csv file path here. I am assuming your csv has only
#the columns and column names you mentioned in your question. If not you have to
#modify the below code to reflect your changed columns
df = pd.read_csv("Path_to_your_input_csv_file_here")
df = df.sort_values(['yday', 'wday', 'time'], ascending=True) #Sort df values based upon yday, wday and time first
df = df.reset_index(drop=True) #Reset the indices after sorting
df2 = df.copy(deep=True) #Make a deep copy of this sorted dataframe
#The below for loop iterates through rows of 'df', finds differences between time values and adds up missing rows to 'df2'
for index, row in df.iterrows(): #Iterate through the rows of df
if index == len(df)-1:
break
else:
if row["yday"] == df.loc[index+1,"yday"] and row["wday"] == df.loc[index+1,"wday"] and row["time"] < df.loc[index+1,"time"]:
differences = list(range(row["time"]+1,df.loc[index+1,"time"]))
for item in differences:
tempdf = pd.DataFrame([[row["yday"], row["wday"],item, row['stop'], 'NA' ]],columns = df2.columns)
df2 = df2.append(tempdf)
#Now sort 'df2' based upon yday,wday and time
df2 = df2.sort_values(['yday', 'wday', 'time'], ascending=True)
df2 = df2.reset_index(drop=True) #Reset indices
print(df2)
干杯 很抱歉代码太长。但它解决了你的目的:
#Specify the input csv file path here. I am assuming your csv has only
#the columns and column names you mentioned in your question. If not you have to
#modify the below code to reflect your changed columns
df = pd.read_csv("Path_to_your_input_csv_file_here")
df = df.sort_values(['yday', 'wday', 'time'], ascending=True) #Sort df values based upon yday, wday and time first
df = df.reset_index(drop=True) #Reset the indices after sorting
df2 = df.copy(deep=True) #Make a deep copy of this sorted dataframe
#The below for loop iterates through rows of 'df', finds differences between time values and adds up missing rows to 'df2'
for index, row in df.iterrows(): #Iterate through the rows of df
if index == len(df)-1:
break
else:
if row["yday"] == df.loc[index+1,"yday"] and row["wday"] == df.loc[index+1,"wday"] and row["time"] < df.loc[index+1,"time"]:
differences = list(range(row["time"]+1,df.loc[index+1,"time"]))
for item in differences:
tempdf = pd.DataFrame([[row["yday"], row["wday"],item, row['stop'], 'NA' ]],columns = df2.columns)
df2 = df2.append(tempdf)
#Now sort 'df2' based upon yday,wday and time
df2 = df2.sort_values(['yday', 'wday', 'time'], ascending=True)
df2 = df2.reset_index(drop=True) #Reset indices
print(df2)
干杯 您的reindex有什么问题?您只需要用缺少的time
值扩展索引,然后使用df。reindex
应该可以工作:这能回答您的问题吗?您的reindex有什么问题?您只需要用缺少的time
值扩展索引,然后使用df。reindex
应该可以工作:这能回答您的问题吗?老兄,你无法想象我现在有多爱你,这很管用!!老兄,你无法想象我现在有多爱你,这很管用!!泰迪
yday wday time stop N
0 320 6 81 1 1
1 320 6 82 1 NA
2 320 6 83 1 NA
3 320 6 84 1 NA
4 320 6 85 1 1
5 320 6 86 1 NA
6 320 6 87 1 NA
7 320 6 88 1 NA
8 320 6 89 1 1
9 320 6 90 1 NA
10 320 6 91 1 NA
11 320 6 92 1 NA
12 320 6 93 1 1
13 320 6 94 1 NA
14 320 6 95 1 NA
15 320 6 96 1 NA
16 320 6 97 1 1