Python数据帧副本_Python_Pandas

Python数据帧副本

python pandas

Python数据帧副本,python,pandas,Python,Pandas,我正在尝试根据一些基于原始数据帧的标准创建一个新的数据帧 df = pandas.io.sql.read_sql(sql, conn) Count_Row = df.shape[0] for j in range(Count_Row - 1): if df.iloc[j, 0] == df.iloc[j + 1, 0]: print(df.iloc[j, 2] + df.iloc[j + 1, 2], df.iloc[j, 4], df.iloc[j, 6], df.

我正在尝试根据一些基于原始数据帧的标准创建一个新的数据帧

df = pandas.io.sql.read_sql(sql, conn)

Count_Row = df.shape[0]
for j in range(Count_Row - 1):

    if df.iloc[j, 0] == df.iloc[j + 1, 0]:
        print(df.iloc[j, 2] + df.iloc[j + 1, 2], df.iloc[j, 4], df.iloc[j, 6], df.iloc[j, 3])

但是，我不想打印，而是想将该数据添加到新的数据框中

这怎么可能呢？

不要使用缓慢的“for”循环来完成这项工作。相反，为所需元素生成一个真实的掩码，然后选择这些元素：

matches = df.iloc[:-1,0] == df.iloc[1:,0]
new_df = df.iloc[:-1][matches]

这将比以前的方法快10-100倍

最后，

new_df

将包含所选行的副本

[：-1]

表示“最后一个元素之前的所有元素”。

您可以将数据附加到新的数据框中，而不是打印出来

import pandas as pd

df = pandas.io.sql.read_sql(sql, conn)
Count_Row = df.shape[0]

results = pd.DataFrame() # create data frame to store results

for j in range(Count_Row - 1):
    if df.iloc[j, 0] == df.iloc[j + 1, 0]:
        # create row of values to append
        row = pd.Series([df.iloc[j, 2] + df.iloc[j + 1, 2], 
                        df.iloc[j, 4], 
                        df.iloc[j, 6], 
                        df.iloc[j, 3]])
        results = results.append([row])

results.columns = ['v1', 'v2', 'v3', 'v4'] # the variables

这将为您提供具有所需输出的数据帧