Python 3.x panda或numpy矢量化以优化内存_Python 3.x_Pandas_Numpy_Memory Management_Vectorization

Python 3.x panda或numpy矢量化以优化内存

python-3.x pandas numpy memory-management

Python 3.x panda或numpy矢量化以优化内存,python-3.x,pandas,numpy,memory-management,vectorization,Python 3.x,Pandas,Numpy,Memory Management,Vectorization,我有两个数据帧迭代，希望使用pandas或numpy矢量化对其进行优化。到目前为止，我的代码为10000个数据消耗了近5GB的内存。请参阅以下代码段： def helper_method(self, data): lines = { "linesAdded": 0, "linesRemoved": 0 } self.logger.info("h

我有两个数据帧迭代，希望使用pandas或numpy矢量化对其进行优化。到目前为止，我的代码为10000个数据消耗了近5GB的内存。请参阅以下代码段：

    def helper_method(self, data):
        lines = {
            "linesAdded": 0,
            "linesRemoved": 0
        }
        self.logger.info("helper_method ${lines}")
        df_diffs = pd.DataFrame(data)
        df_diffs = df_diffs.fillna(0)
        data_hunks = []
        for _index_diffs, row_hunks in df_diffs.iterrows():
            if "hunks" in row_hunks.index.values and isinstance(
                    row_hunks["hunks"], list):
                data_hunks.extend(row_hunks["hunks"])
        df_segments = pd.io.json.json_normalize(
            data_hunks, "segments")
        for _index, row in df_segments.iterrows():
            if row["type"] == "ADDED":
                lines["linesAdded"] += len(row["lines"])
            if row["type"] == "REMOVED":
                lines["linesRemoved"] += len(row["lines"])
        return lines

我们如何优化内存并将其转换为矢量化