Python 向现有数据帧添加行的最快方法
我目前正试图在现有csv的基础上创建一个新的csv 我找不到基于现有数据帧值设置数据帧值的更快方法Python 向现有数据帧添加行的最快方法,python,pandas,Python,Pandas,我目前正试图在现有csv的基础上创建一个新的csv 我找不到基于现有数据帧值设置数据帧值的更快方法 import pandas import sys import numpy import time # path to file as argument path = sys.argv[1] df = pandas.read_csv(path, sep = "\t") # only care about lines with response_time df = df[pandas.notnu
import pandas
import sys
import numpy
import time
# path to file as argument
path = sys.argv[1]
df = pandas.read_csv(path, sep = "\t")
# only care about lines with response_time
df = df[pandas.notnull(df['response_time'])]
# new empty dataframe
new_df = pandas.DataFrame(index = df["datetime"])
# new_df needs to have datetime as index
# and columns based on a combination
# of 2 columns name from previous dataframe
# (there are only 10 differents combinations)
# and response_time as values, so there will be lots of
# blank cells but I don't care
for i, row in df.iterrows():
start = time.time()
new_df.set_value(row["datetime"], row["name"] + "-" + row["type"], row["response_time"])
print(i, time.time() - start)
原始数据帧是:
datetime name type response_time
0 2018-12-18T00:00:00.500829 HSS_ANDROID audio 0.02430
1 2018-12-18T00:00:00.509108 HSS_ANDROID video 0.02537
2 2018-12-18T00:00:01.816758 HSS_TEST audio 0.03958
3 2018-12-18T00:00:01.819865 HSS_TEST video 0.03596
4 2018-12-18T00:00:01.825054 HSS_ANDROID_2 audio 0.02590
5 2018-12-18T00:00:01.842974 HSS_ANDROID_2 video 0.03643
6 2018-12-18T00:00:02.492477 HSS_ANDROID audio 0.01575
7 2018-12-18T00:00:02.509231 HSS_ANDROID video 0.02870
8 2018-12-18T00:00:03.788196 HSS_TEST audio 0.01666
9 2018-12-18T00:00:03.807682 HSS_TEST video 0.02975
新的_df将如下所示:
我每圈7毫秒
处理(仅?)400000行数据帧需要很长时间。如何加快速度?事实上,使用将实现您想要的功能,例如:
import pandas as pd
new_df = pd.pivot(df.datetime, df.name + '-' + df.type, df.response_time)
print (new_df.head())
HSS_ANDROID-audio HSS_ANDROID-video \
datetime
2018-12-18T00:00:00.500829 0.0243 NaN
2018-12-18T00:00:00.509108 NaN 0.02537
2018-12-18T00:00:01.816758 NaN NaN
2018-12-18T00:00:01.819865 NaN NaN
2018-12-18T00:00:01.825054 NaN NaN
HSS_ANDROID_2-audio HSS_ANDROID_2-video \
datetime
2018-12-18T00:00:00.500829 NaN NaN
2018-12-18T00:00:00.509108 NaN NaN
2018-12-18T00:00:01.816758 NaN NaN
2018-12-18T00:00:01.819865 NaN NaN
2018-12-18T00:00:01.825054 0.0259 NaN
HSS_TEST-audio HSS_TEST-video
datetime
2018-12-18T00:00:00.500829 NaN NaN
2018-12-18T00:00:00.509108 NaN NaN
2018-12-18T00:00:01.816758 0.03958 NaN
2018-12-18T00:00:01.819865 NaN 0.03596
2018-12-18T00:00:01.825054 NaN NaN
要不具有NaN
,您可以将其与任何想要的值一起使用,例如:
new_df = pd.pivot(df.datetime, df.name +'-'+df.type, df.response_time).fillna(0)
您还可以使用
取消堆叠
作为另一个选项
new = df.set_index(['type','name', 'datetime']).unstack([0,1])
new.columns = ['{}-{}'.format(z,y) for x,y,z, in new.columns]
使用f-strings
将比格式快一点:
new.columns=[f'{z}-{y}代表x,y,z,在new.columns]
是否要复制数据帧?可以使用loc。对不起,我没有时间给你打一个好的例子。不过,这里有一些文档:以文本而不是图片的形式提供一些输入数据。使用可能是解决您试图解决的问题的一种方法do@MohitMotwani不,我不想复制一个数据帧,我们需要看到原始数据帧(df),你能做df.head(10)吗。。。谢谢