Python 计算CSV文件中字段的首次出现次数_Python_Dictionary

Python 计算CSV文件中字段的首次出现次数

python dictionary

Python 计算CSV文件中字段的首次出现次数,python,dictionary,Python,Dictionary,使用以下CSV文件的格式： Pos ID Name 1 0001L01 50293 2 0002L01 128864 3 0003L01 172937 4 0004L01 12878 5 0005L01 demo 6 0004L01 12878 7 0004L01 12878 8 0005L01 demo 我想在字典中包括：[ID]，{Pos，Name，FirstTime}，其中FirstTime对应于ID首次出现在CSV文件中的位置。例如，ID=00

使用以下CSV文件的格式：

Pos   ID    Name
1   0001L01 50293
2   0002L01 128864
3   0003L01 172937
4   0004L01 12878
5   0005L01 demo
6   0004L01 12878
7   0004L01 12878
8   0005L01 demo

我想在字典中包括：

[ID]，{Pos，Name，FirstTime

}，其中

FirstTime

对应于

ID

首次出现在CSV文件中的位置。例如，

ID=0005L01

将有：

[0005L01]，{5，demo，5}，{8，demo，5}

我已经成功地存储了

[ID]、{Pos，Name}

，但我第一次遇到

的问题。到目前为止，我已经：
 # From the csv reader, save it to a list
 dlist=[]
 for row in reader:
      # store only the non empty lines
      if any(row):
         dlist.append(row)
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2]])

如果您可以使用熊猫，请尝试以下方法：
In [269]: temp
Out[269]: 
   Pos       ID    Name
0    1  0001L01   50293
1    2  0002L01  128864
2    3  0003L01  172937
3    4  0004L01   12878
4    5  0005L01    demo
5    6  0004L01   12878
6    7  0004L01   12878
7    8  0005L01    demo

接下来，按ID分组并应用min
：
In [271]: temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})
Out[271]: 
         Firsttime    Name
ID                        
0001L01          1   50293
0002L01          2  128864
0003L01          3  172937
0004L01          4   12878
0005L01          5    demo

In [272]: y = temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})

现在，与原始数据帧合并：
In [276]: temp.merge(y)
Out[276]: 
   Pos       ID    Name  Firsttime
0    1  0001L01   50293          1
1    2  0002L01  128864          2
2    3  0003L01  172937          3
3    4  0004L01   12878          4
4    6  0004L01   12878          4
5    7  0004L01   12878          4
6    5  0005L01    demo          5
7    8  0005L01    demo          5

现在，迭代并将其保存到字典中：
In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

如果您可以使用熊猫，请尝试以下方法：
In [269]: temp
Out[269]: 
   Pos       ID    Name
0    1  0001L01   50293
1    2  0002L01  128864
2    3  0003L01  172937
3    4  0004L01   12878
4    5  0005L01    demo
5    6  0004L01   12878
6    7  0004L01   12878
7    8  0005L01    demo

接下来，按ID分组并应用min
：
In [271]: temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})
Out[271]: 
         Firsttime    Name
ID                        
0001L01          1   50293
0002L01          2  128864
0003L01          3  172937
0004L01          4   12878
0005L01          5    demo

In [272]: y = temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})

现在，与原始数据帧合并：
In [276]: temp.merge(y)
Out[276]: 
   Pos       ID    Name  Firsttime
0    1  0001L01   50293          1
1    2  0002L01  128864          2
2    3  0003L01  172937          3
3    4  0004L01   12878          4
4    6  0004L01   12878          4
5    7  0004L01   12878          4
6    5  0005L01    demo          5
7    8  0005L01    demo          5

现在，迭代并将其保存到字典中：
In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

如果您可以使用熊猫，请尝试以下方法：
In [269]: temp
Out[269]: 
   Pos       ID    Name
0    1  0001L01   50293
1    2  0002L01  128864
2    3  0003L01  172937
3    4  0004L01   12878
4    5  0005L01    demo
5    6  0004L01   12878
6    7  0004L01   12878
7    8  0005L01    demo

接下来，按ID分组并应用min
：
In [271]: temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})
Out[271]: 
         Firsttime    Name
ID                        
0001L01          1   50293
0002L01          2  128864
0003L01          3  172937
0004L01          4   12878
0005L01          5    demo

In [272]: y = temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})

现在，与原始数据帧合并：
In [276]: temp.merge(y)
Out[276]: 
   Pos       ID    Name  Firsttime
0    1  0001L01   50293          1
1    2  0002L01  128864          2
2    3  0003L01  172937          3
3    4  0004L01   12878          4
4    6  0004L01   12878          4
5    7  0004L01   12878          4
6    5  0005L01    demo          5
7    8  0005L01    demo          5

现在，迭代并将其保存到字典中：
In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

如果您可以使用熊猫，请尝试以下方法：
In [269]: temp
Out[269]: 
   Pos       ID    Name
0    1  0001L01   50293
1    2  0002L01  128864
2    3  0003L01  172937
3    4  0004L01   12878
4    5  0005L01    demo
5    6  0004L01   12878
6    7  0004L01   12878
7    8  0005L01    demo

接下来，按ID分组并应用min
：
In [271]: temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})
Out[271]: 
         Firsttime    Name
ID                        
0001L01          1   50293
0002L01          2  128864
0003L01          3  172937
0004L01          4   12878
0005L01          5    demo

In [272]: y = temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})

现在，与原始数据帧合并：
In [276]: temp.merge(y)
Out[276]: 
   Pos       ID    Name  Firsttime
0    1  0001L01   50293          1
1    2  0002L01  128864          2
2    3  0003L01  172937          3
3    4  0004L01   12878          4
4    6  0004L01   12878          4
5    7  0004L01   12878          4
6    5  0005L01    demo          5
7    8  0005L01    demo          5

现在，迭代并将其保存到字典中：
In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

如果您先计算第一次
，然后填写字典，会更容易：
In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

如果您先计算第一次
，然后填写字典，会更容易：
In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

如果您先计算第一次
，然后填写字典，会更容易：
In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

如果您先计算第一次
，然后填写字典，会更容易：
In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

@fourtheyeFirstTime
是一个ID第一次出现在CSV文件中的位置。你真的希望{Pos，Name，FirstTime}
成为一个集合，而不是像元组那样有序的东西吗？@fourtheyeFirstTime
是一个ID第一次出现在CSV文件中的位置，你真的希望{Pos Name，FirstTime}
是一个集合，而不是一个有序的东西，比如元组？@thefourtheyeFirstTime
是一个ID第一次出现在CSV文件中的位置，您确实希望{Pos，Name，FirstTime}
是一个集合，而不是有序的东西，像一个元组？@thefourtheyeFirstTime
是一个ID第一次出现在CSV文件中的位置。你真的希望{Pos，Name，FirstTime}
成为一个集合，而不是像元组那样有序的东西吗？@mescalinum:差不多明白了，如果第[0]行不是第一次：FirstTime[row[1]=row[0]

替换为

if row[1]不是第一次：第一次[行[1]]=行[0]

噢！正确的！很抱歉，我认为这个解决方案非常简单，所以我没有对它进行测试：-）[我刚刚更正了我的答案]@mescalinum:差不多明白了，只要替换

如果第[0]行不在第一时间：第一时间[1]行]=0行

如果第[1]行不在第一时间：第一时间[1]行]=0行哦！正确的！很抱歉，我认为这个解决方案非常简单，所以我没有对它进行测试：-）[我刚刚更正了我的答案]@mescalinum:差不多明白了，只要替换

如果第[0]行不在第一时间：第一时间[1]行]=0行

如果第[0]行不在第一时间：第一时间[1]行]=0行

如果第[1]行不在第一时间：第一时间[1]行]=0行哦！正确的！抱歉，我认为这个解决方案很简单，所以我没有测试它：-）[我刚刚更正了我的答案]