如何从模型创建数据帧(Django)
我正在Django开发一个应用程序。 我有这个型号如何从模型创建数据帧(Django),django,pandas,dataframe,model,copy,Django,Pandas,Dataframe,Model,Copy,我正在Django开发一个应用程序。 我有这个型号 class my_model(models.Model): Field_A = models.CharField(max_length=256, blank=True, null=True) Field_B = models.CharField(max_length=25, blank=True, null=True) Field_C = models.TextField(blank=True, n
class my_model(models.Model):
Field_A = models.CharField(max_length=256, blank=True, null=True)
Field_B = models.CharField(max_length=25, blank=True, null=True)
Field_C = models.TextField(blank=True, null=True)
我想创建一个列名称等于模型字段名称的数据框,
在每一行中包含模型对象,在每一列中包含对象字段值
我怎么做?只有一个命令可以执行吗?还是我必须迭代
编辑:以下是我迄今为止找到的原始且不合法的解决方案:
import pandas as pd
entries = my_model.objects.all()
# this generates an array containing the names of the model fields
columns_names = [field.name for field in my_model._meta.get_fields()]
L_GI = len(entries)
# generate empty dataframe
GI = pd.DataFrame(columns = columns_names)
for element in entries:
new_entry = {"Field_A":element.Field_A, "Field_B":element.Field_B, "Field_C":element.Field_C}
GI = GI.append(new_entry, ignore_index=True)
我打赌有一种更快的方法可以避免迭代。有什么建议吗?问得好
我认为您必须重复它们
我已经用3种不同的方式实现了它,所以你可以选择你最喜欢的一种
import time
import pandas as pd
from django.core import serializers
class PandasModelMixin(models.Model):
class Meta:
abstract = True
@classmethod
def as_dataframe(cls, queryset=None, field_list=None):
t1 = time.time()
if queryset is None:
queryset = cls.objects.all()
if field_list is None:
field_list = [_field.name for _field in cls._meta._get_fields(reverse=False)]
data = []
[data.append([obj.serializable_value(column) for column in field_list]) for obj in queryset]
columns = field_list
df = pd.DataFrame(data, columns=columns)
print("Execution time without serialization: %s" % time.time()-t1)
return df
@classmethod
def as_dataframe_using_django_serializer(cls, queryset=None):
t1 = time.time()
if queryset is None:
queryset = cls.objects.all()
if queryset.exists():
serialized_models = serializers.serialize(format='python', queryset=queryset)
serialized_objects = [s['fields'] for s in serialized_models]
data = [x.values() for x in serialized_objects]
columns = serialized_objects[0].keys()
df = pd.DataFrame(data, columns=columns)
df = pd.DataFrame()
print("Execution time using Django serializer: %s" % time.time()-t1)
return df
@classmethod
def as_dataframe_using_drf_serializer(cls, queryset=None, drf_serializer=None, field_list=None):
from rest_framework import serializers
t1 = time.time()
if queryset is None:
queryset = cls.objects.all()
if drf_serializer is None:
class CustomModelSerializer(serializers.ModelSerializer):
class Meta:
model = cls
fields = field_list or '__all__'
drf_serializer = CustomModelSerializer
serialized_objects = drf_serializer(queryset, many=True).data
data = [x.values() for x in serialized_objects]
columns = drf_serializer().get_fields().keys()
df = pd.DataFrame(data, columns=columns)
print("Execution time using DjangoRestFramework serializer: %s" % time.time()-t1)
return df
因此,请以这种方式继承您的模型:
class MyModel(PandasModelMixin):
field_a = models.CharField(max_length=256, blank=True, null=True)
field_b = models.CharField(max_length=25, blank=True, null=True)
field_c = models.TextField(blank=True, null=True)
>> MyModel.as_dataframe()
>> MyModel.as_dataframe_using_django_serializer()
>> MyModel.as_dataframe_using_drf_serializer()
并按以下方式尝试代码:
class MyModel(PandasModelMixin):
field_a = models.CharField(max_length=256, blank=True, null=True)
field_b = models.CharField(max_length=25, blank=True, null=True)
field_c = models.TextField(blank=True, null=True)
>> MyModel.as_dataframe()
>> MyModel.as_dataframe_using_django_serializer()
>> MyModel.as_dataframe_using_drf_serializer()
我使用一个包含450个实例和15列的模型尝试了我的代码,结果如下:
未序列化的执行时间:0.070409052453613
使用Django序列化程序的执行时间:0.07644820213317871
使用DjangoRestFramework序列化程序的执行时间:0.12314629554748535
注意。
我正在使用Django 2.2和Python 3.6.5