Python 从其他列行迭代计算列行_Python_Pandas_Calculated Columns

Python 从其他列行迭代计算列行

python pandas

Python 从其他列行迭代计算列行,python,pandas,calculated-columns,Python,Pandas,Calculated Columns,我试图进行一些计算，并将它们放入新的命名列中，方法是从行中获取一个值，根据公式计算，并为同一行选择两个不同的列。以下是数据和计算列的示例： X Y TEMP Data_1 Data_2 Data_3 Data_4 0 0 30 519 521 521 521 0 0 45 568 569 570 570 0 0 60 617 618 619 619 0 0 85 701 701 703 703 0 1 30 532 533

我试图进行一些计算，并将它们放入新的命名列中，方法是从行中获取一个值，根据公式计算，并为同一行选择两个不同的列。以下是数据和计算列的示例：

X   Y   TEMP    Data_1  Data_2  Data_3  Data_4
0   0   30  519 521 521 521
0   0   45  568 569 570 570
0   0   60  617 618 619 619
0   0   85  701 701 703 703
0   1   30  532 533 533 532
0   1   45  580 581 580 580
0   1   60  628 629 629 629
0   1   85  711 710 711 712
0   2   30  512 513 514 512
0   2   45  560 561 562 560
0   2   60  609 610 611 609
0   2   85  692 691 694 691
0   3   60  617 617 619 618
0   3   85  700 699 702 701
0   4   30  520 521 522 521
0   4   45  568 569 570 570
0   4   60  617 617 619 618
0   4   85  700 699 702 701

下面是我如何尝试使输出看起来像：

X   Y   TEMP    Data_1  Data_2  Data_3  Data_4  Calculated_1    Calculated_2    Calculated_3    Calculated_4
0   0   30  519 521 521 521 Col A, Rows (2:5) and Data 1 Rows (2:5) Col A, Rows (2:5) and Data 2 Rows (2:5) Col A, Rows (2:5) and Data 3 Rows (2:5) Col A, Rows (2:5) and Data 4 Rows (2:5)
0   0   45  568 569 570 570             
0   0   60  617 618 619 619             
0   0   85  701 701 703 703             
0   1   30  532 533 533 532 Col A, Rows (6:9) and Data 1 Rows (6:9) Col A, Rows (6:9) and Data 2 Rows (6:9) Col A, Rows (6:9) and Data 3 Rows (6:9) Col A, Rows (6:9) and Data 4 Rows (6:9)
0   1   45  580 581 580 580             
0   1   60  628 629 629 629             
0   1   85  711 710 711 712             
0   2   30  512 513 514 512 Col A, Rows (10:13) and Data 1 Rows (10:13) Col A, Rows (10:13) and Data 2 Rows (10:13) Col A, Rows (10:13) and Data 3 Rows (10:13) Col A, Rows (10:13) and Data 4 Rows (10:13)
0   2   45  560 561 562 560             
0   2   60  609 610 611 609             
0   2   85  692 691 694 691             
0   3   60  617 617 619 618 Col A, Rows (14:15) and Data 1 Rows (14:15) Col A, Rows (14:15) and Data 2 Rows (14:15) Col A, Rows (14:15) and Data 3 Rows (14:15) Col A, Rows (14:15) and Data 4 Rows (14:15)
0   3   85  700 699 702 701             
0   4   30  520 521 522 521 Col A, Rows (16:19) and Data 1 Rows (16:19) Col A, Rows (16:19) and Data 2 Rows (16:19) Col A, Rows (16:19) and Data 3 Rows (16:19) Col A, Rows (16:19) and Data 4 Rows (16:19)
0   4   45  568 569 570 570             
0   4   60  617 617 619 618             
0   4   85  700 699 702 701

请帮助我如何对整个数据帧执行此操作，然后保存到CSV文件

这是我的代码：（但它用最后计算的值填充计算列）

我想指出的是，还有两列X和Y，我用它们来计算我需要用来计算计算列的值的数量，因为有时候一些X和Y的Temp数据缺失。

我明白你的意思（我猜）。所以像这样读取CSV文件

import csv

def csv_reader(file_object):
    reader = csv.reader(file_object)

    row_count = 2
    temp_count = 0
    buffer_values = []

    for row in reader:

        # GETTING NEEDED DATA HERE

        data_1 = row[1]
        data_2 = row[2]

        buffer_values.append(row)

        temp_count += 1

        if (temp_count - row_count == 3): 

            # ACCESS THE BUFFER VALUES HERE.
            # THE BUFFER VALUES WILL HAVE DATA OF [2:5] ROWS FOR THE FIRST HIT HERE.
            # FOR THE NEXT HIT IT WILL BE [6:9]
            # IMPLEMENT YOUR FORMULAS HERE WITH data_1, data_2...

            row_count += 4
            temp_count = row_count

            # CLEAR THE BUFFER FOR NEXT RUN
            buffer_values = []

现在，诀窍在于编写一个包含所有数据的新CSV文件。您可以在每次清除缓冲区之前执行此操作，或者将所有结果存储在另一个变量中，然后将其转储到文件中。希望这有帮助：）

我设法让代码正常工作

添加以下内容以创建列名：

for i in range(1, 5):
    data_1p8['Calculated_'+str(i)] = ''

现在我有了列名，我继续对循环代码进行一些小的更改：

i = 0
j = 0
i = 0
k = 0
df_length = len(df.count(1)) - 1
for row in df.iterrows():
    if (int(df.loc[k, 'X']) == int(df.loc[k+1, 'X'])):
        if (int(df.loc[k, 'Y']) == int(df.loc[k+1, 'Y'])):
            j = j + 1
        else:
           for l in range(1, 5):
               df.loc[i,'Calculated_'+str(l)] =      ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
            i = i + j + 1
            j = 0
    else:
        for l in range(1, 5):
               df.loc[i,'Calculated_'+str(l)] =      ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
        i = i + j + 1
        j = 0
    k = k + 1
    if k == df_length:
        break

请注意，现在我使用变量“I”访问行的位置。

主要是通过一些尝试和错误，并阅读了一些关于如何将.loc用于数据帧的信息。

那么，您的代码在哪里，到底出了什么问题？添加了代码。我想指出，还有另外两列X和Y，我使用它们来计算计算列所需的值的数量，因为有时某些X和Y的Temp缺少数据。它到底出了什么问题？我不熟悉python和pandas（一般编程）。需要使用python实现一些与工作相关的数据处理自动化。我想添加一个新的列标题，然后为该列添加每个新XY行的计算。对电子表格的其余部分重复此操作。此外，每个XY的温度数可能会有所不同，我在循环开始时发现了这一点。我不知道如何在公式中的=符号之前在左侧写下语句来实现这一点，或者有更好的方法来实现这一点吗？现在，它正在用最后一组XY的最后计算值填充整个列。换句话说，我如何不让公式填充整个列，而只填充该行的第一行。我正在尝试写入使用df.to_CSV（'filename.CSV'）读取CSV数据帧的同一个文件

i = 0
j = 0
i = 0
k = 0
df_length = len(df.count(1)) - 1
for row in df.iterrows():
    if (int(df.loc[k, 'X']) == int(df.loc[k+1, 'X'])):
        if (int(df.loc[k, 'Y']) == int(df.loc[k+1, 'Y'])):
            j = j + 1
        else:
           for l in range(1, 5):
               df.loc[i,'Calculated_'+str(l)] =      ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
            i = i + j + 1
            j = 0
    else:
        for l in range(1, 5):
               df.loc[i,'Calculated_'+str(l)] =      ((j+1)*sum(df.loc[i:i+j,'TEMP'+str(l)]*df.loc[i:i+j,'Data_' + str(l)])-(sum(df.loc[i:i+j,'TEMP'+str(l)])*sum(df.loc[i:i+j,'Data_'+str(l)])))/((j+1)*sum(df.loc[i:i+j,'TEMP' +_ str(l)]*df.loc[i:i+j,'TEMP' + str(l)]) - (sum(df.loc[i:i+j,'TEMP'+str(l)]))**2)
        i = i + j + 1
        j = 0
    k = k + 1
    if k == df_length:
        break