Python 使用.iterrows（）for循环将行追加到数据帧_Python_Python 3.x_Pandas_Loops_Dataframe

Python 使用.iterrows（）for循环将行追加到数据帧

python python-3.x pandas loops dataframe

Python 使用.iterrows（）for循环将行追加到数据帧,python,python-3.x,pandas,loops,dataframe,Python,Python 3.x,Pandas,Loops,Dataframe,假设我有以下数据帧： xx yy tt 0 2.8 1.0 1.0 1 85.0 4.48 6.5 2 2.1 8.0 1.0 3 8.0 1.0 0.0 4 9.0 2.54 1.64 5 5.55 7.25 3.15 6 1.66 0.0 4.0 7 3.0 7.11 1.98 8 1.0 0.0

假设我有以下数据帧：

     xx      yy      tt
0   2.8     1.0     1.0
1   85.0    4.48    6.5
2   2.1     8.0     1.0
3   8.0     1.0     0.0
4   9.0     2.54    1.64
5   5.55    7.25    3.15
6   1.66    0.0     4.0
7   3.0     7.11    1.98
8   1.0     0.0     4.65
9   1.87    2.33    0.0

我想用它创建一个for循环，迭代df中的所有点，并计算到所有其他点的欧几里德距离。例如：循环将在点a上迭代，得到从点a到点b、c、d…n的距离。然后它会到达b点，得到到a点，c点，d点，n点的距离，依此类推

一旦我得到了距离，我想得到距离值的

value\u counts（）

，但是为了节省内存，我不能只

value\u counts（）

这个foor循环得到的所有结果，因为我的实际df太大了，我最终会耗尽内存

所以我的想法是，对距离向量执行

value\u counts（）

操作，这将给出一个2列数据帧，其中包含值及其各自的计数，然后当它在点b上迭代并获得所有距离时，我想将新值与之前的

value\u counts（）进行比较

df从第一个循环开始，检查是否有任何重复值，如果有，那么我想

+=

重复值的计数器，如果没有找到重复值，我想

附加（）

所有没有重复值的行到距离df

到目前为止，我得到的是：

import pandas as pd

counts = pd.DataFrame()

for index, row in df.iterrows():

    dist = pd.Series(np.sqrt((row.xx - df.xx)**2 + (row.yy - df.yy)**2 + (row.tt - df.tt)**2)) # Create a vector containing all the distances from each point to the others

    counter = pd.Series(dist.value_counts(sort = True)).reset_index().rename(columns = {'index': 'values', 0:'counts'}) # Get a counter for every value in the distances vector

    if index in counter['values']:
        counter['counts'][index] += 1 # Check if the new values are in the counter df, if so, add +1 to each repeated value

    else:

        counts = counts.append((index,row)) # If no repeated values, then append new rows to the counter df

预期结果如下：

# These are the value counts for point a and its distances:

    values  counts
0   0.000000    644589
1   0.005395    1
2   0.005752    1
3   0.016710    1
4   0.023043    1
5   0.012942    1
6   0.020562    1

现在在b点的迭代中：

       values   counts
0   0.000000    644595  # Value repeated 6 times, so add +6 to the counter
1   0.005395    1
2   0.005752    1
3   0.016710    3  # Value repeated twice, so add +2 to the counter
4   0.023043    1
5   0.012942    1
6   0.020562    1
7   0.025080    1  # New value, so append a new row with value and counter
8   0.022467    1  # New value, so append a new row with value and counter

但是，如果您将

print（counts）

添加到循环的末尾以检查此循环执行的操作的结果，您将看到一个空的数据帧。这就是为什么我要问这个问题。为什么这段代码给出了一个空的df，我怎样才能让它按照我想要的方式工作呢

如果你需要更多的解释，有些事情不清楚，或者需要更多的信息，请毫不犹豫地询问

提前感谢

如果您理解，您希望出现每个距离值：

因此，我建议您创建一个dict：键是值，键的值是计数：

data = """
   xx      yy      tt
2.8     1.0     1.0
85.0    4.48    6.5
2.1     8.0     1.0
8.0     1.0     0.0
9.0     2.54    1.64
5.55    7.25    3.15
1.66    0.0     4.0
3.0     7.11    1.98
1.0     0.0     4.65
1.87    2.33    0.0
"""

import pandas as pd
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')

dico ={}                            #i initialize the dict dico
for index, row in df.iterrows():
    dist = pd.Series(np.sqrt((row.xx - df.xx) ** 2 + (row.yy - df.yy) ** 2 + 
          (row.tt - df.tt) ** 2))   # Create a vector containing all the 
                                    #distances from each point to the others

    for f in dist:                  #i iterate through dist
        if f in dico:               #the key already exists in dict?
            dico[f] +=dico[f]       #yes i increment the value
        else:
            dico[f]=1               #no i create the key with the new distance and set to 1

print(dico)

输出：

{0.0: 512, 
82.45726408267497: 2, 
7.034912934784623: 2, 
5.295280917949491: 2, 
6.4203738208923635: 2, 
7.158735921934822: 2, 
3.361487765856065: 2, 
6.191324575565393: 2, 
4.190763653560053: 2, 
1.9062528688503002: 2, 
83.15678204452118: 2, 
77.35218419669867: 2, 
76.17993961667337: 2, 
79.56882492534372: 2, 
    :
    :
7.511863949779708: 2,
0.9263368717696604: 2, 
4.633896848226123: 2, 
7.853725230742415: 2, 
5.295819105671946: 2, 
5.273357564208974: 2}

每个值至少有2个计数，因为它是一个交叉表，距离（点0到点1）等于距离（点1到点0）…

它是因为循环永远不会进入else条件，这就是为什么数据帧是空的嗯，组合是什么？这是一个特殊的图书馆吗？不，是df。给我一秒钟，我会编辑这个问题，这样它会更清晰，我会再次发法语。这有点接近我想要的，但这是否会将新的计数值与以前的计数值进行比较，并将它们添加到dict中（如果它们尚未在dict中）？？另外，请记住，如果dict中已经存在一些新值，则只需向该值的计数器中添加+1即可。这两个条件都满足了吗？非常感谢你，我在prog中添加了评论，可以吗？我已经做了我所理解的（对不起我的英语）。对于600000行，执行时间将很长…好的，这很好。我现在什么都懂了。非常感谢你的回答。这帮了大忙！！英国人不用担心：）乐意帮助你！！