Python 使用for循环的数据帧索引_Python_Pandas_Dataframe

Python 使用for循环的数据帧索引

python pandas dataframe

Python 使用for循环的数据帧索引,python,pandas,dataframe,Python,Pandas,Dataframe,这是昨天问题的后续问题。我有一个由csv文件创建的数据框，我正在尝试比较当前值和下一个值。如果它们是一样的，我做一件事，否则，我做另一件事。我遇到了一个超出范围的问题，希望能找到解决办法 CSV: 预期输出CSV（备份CSV）：最终CSV： date fruitid quantity 4/5/2014 13:34 fruit0 73 4/5/2014 3:41 fruit1 85 4/6/2014 12:46 fruit2 14 4

这是昨天问题的后续问题。我有一个由csv文件创建的数据框，我正在尝试比较当前值和下一个值。如果它们是一样的，我做一件事，否则，我做另一件事。我遇到了一个超出范围的问题，希望能找到解决办法

CSV:

预期输出CSV（备份CSV）：

最终CSV：

date    fruitid quantity  
    4/5/2014 13:34  fruit0  73 
    4/5/2014 3:41   fruit1  85 
    4/6/2014 12:46  fruit2  14 
    4/8/2014 8:59   fruit3  52 
    4/10/2014 2:07  fruit0  152 
    4/10/2014 18:10 fruit4  23 
    4/10/2014 2:40  fruit5  98

代码：

我认为你的for循环已经有一个索引了

尝试：

范围（0，透镜（df）-1）内的x的

：

反而

编辑： 有道理的是：

新水果=旧水果[x+1]

没有给出预期的结果，old_fruit不是一个列表，而是一个字符串。我想你想要的是：

new\u fruit=df.fruit[x+1]

编辑（2）：

您应该添加：

df.NewCol[x+1]=“水果”+str（x）

我的工作脚本是：

    import pandas as pd
    import numpy
    df = pd.read_csv('data.csv', header=0, dtype='unicode')
    df_count = df['fruit'].value_counts()
    df.sort_values(['fruit'], ascending=True, inplace=True) #sorting the column 
    #fruit
    df.reset_index(drop=True, inplace=True)
    #print(df)
    x = 0 #starting my counter values or position in the column
    #old_fruit = df.fruit[x]
    #new_fruit = df.fruit[x+1]
    df.loc[:,'NewCol'] = 0 # to create the new column
    print(df)
    for x in range(0, len(df)-1):
            old_fruit = df.fruit[x] #Starting fruit
            new_fruit = df.fruit[x+1] #next fruit to compare with
            if old_fruit == new_fruit:
                    #print(x)
                    #print(old_fruit, new_fruit)
                    df.NewCol[x] = 'fruit' + str(x)
                    df.NewCol[x+1] = 'fruit' + str(x)#if they are the same, put 
                    #fruit[x] or fruit0 in the current row

            else:
                    print("Not the Same")
                    #print(x)
                    #print(old_fruit, new_fruit)
                    df.NewCol[x+1] = 'fruit' +str(x+1) #if they are the same, 
                    #put fruit[x+1] or fruit1 in the current row
    print(df)

新答案

f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df.assign(NewCol=n)

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df = df.assign(NewCol=n)
# Equivalent to
# df['NewCol'] = n
df

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

使用

factorize

df.assign(
    NewCol=np.core.defchararray.add('Fruit', df.fruit.factorize()[0].astype(str))
)

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

不是一行，而是更好

f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df.assign(NewCol=n)

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df = df.assign(NewCol=n)
# Equivalent to
# df['NewCol'] = n
df

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

答案相同，但正在更新
df

f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df.assign(NewCol=n)

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df = df.assign(NewCol=n)
# Equivalent to
# df['NewCol'] = n
df

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

旧答案

f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df.assign(NewCol=n)

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df = df.assign(NewCol=n)
# Equivalent to
# df['NewCol'] = n
df

              date         fruit  quantity  NewCol
0   4/5/2014 13:34        Apples        73  Fruit0
1    4/5/2014 3:41      Cherries        85  Fruit1
2   4/6/2014 12:46         Pears        14  Fruit2
3    4/8/2014 8:59       Oranges        52  Fruit3
4   4/10/2014 2:07        Apples       152  Fruit0
5  4/10/2014 18:10       Bananas        23  Fruit4
6   4/10/2014 2:40  Strawberries        98  Fruit5

@SeaMonkey明确了我们看到错误的原因

然而，我在猜测你想做什么。
我将

cumcount

添加到

fruit

df.assign(NewCol=df.fruit + df.groupby('fruit').cumcount().astype(str))

              date         fruit  quantity         NewCol
0   4/5/2014 13:34        Apples        73        Apples0
1    4/5/2014 3:41      Cherries        85      Cherries0
2   4/6/2014 12:46         Pears        14         Pears0
3    4/8/2014 8:59       Oranges        52       Oranges0
4   4/10/2014 2:07        Apples       152        Apples1
5  4/10/2014 18:10       Bananas        23       Bananas0
6   4/10/2014 2:40  Strawberries        98  Strawberries0

我认为这越来越近了，但现在我仍然有这些错误：SettingWithCopyWarning:试图在数据帧和回溯（最近一次调用）的切片副本上设置值：文件“C:/Python36/csvtester3.py”，第18行，在new_fruit=old_fruit[x+1]中#下一个与indexer比较的水果：字符串索引超出范围，通过更改new_fruit=df.fruit[x+1]，我非常接近我所需要的。现在唯一的问题是，第二行的NewCol值显示为0，而不是我想要的0。我在代码中添加了编辑，这几乎是完美的。无论出于什么原因，它都会跳过果1。它确实捕获了Apple的第二个实例，并将水果0添加到行中。我建议您按照@piRSquared的说明操作，我建议的方法不够通用，你需要一个额外的循环来删除它跳过一个数字的所有实例。你能解释一下你想做什么并发布一些预期的输出吗？这就是我希望输出CSV最终的样子，在我用Newcol完全替换果列之前。最后，我将对大量代理日志数据执行相同的过程。日期水果数量NewCol 4/5/2014 13:34苹果73水果0 4/5/2014 3:41樱桃85水果1 4/6/2014 12:46梨14水果2 4/8/2014 8:59橙子52水果3 4/10/2014 2:07苹果152水果0 4/10/2014 18:10香蕉23水果4/10/2014 2:40草莓98水果5这看起来太棒了。然而，我试图掩盖水果的真实名称。所以在我的代理日志中，它将类似于bobsmith，我希望它类似于user1，而johnwayne将是user2。我猜是伪加密？@TravisCowart在这里，在问题中包含预期结果变得非常有用。编辑您的问题并将其包括在内，我们将为您提供所需的内容。谢谢。我已经用预期的结果更新了我的问题。希望在我希望输出的另外两个CSV上能更清楚一点。新的因式分解答案似乎可以取代对for循环的任何需求。那会不会在df.重置之后。。。并删除所有现有的for循环？我尝试了一下，但打印结果df只会给我排序后的df.Hat off you@piRSquared，这是一个非常优雅的解决方案！