Python 将csv文件中的类别转换为数字_Python_Csv

Python 将csv文件中的类别转换为数字

python csv

Python 将csv文件中的类别转换为数字,python,csv,Python,Csv,我有一个像这样的csv文件 1,a,add 2,b,more 1,c,thinking 3,a,to 1,c,me 我想将其从类别功能转换为数字格式，如下所示： 1,0,0 2,1,1 1,2,2 3,0,3 1,2,4 但我有一个问题：我不能在python中读写文件时循环，要么它只转换第二列，而对第三列不做任何操作，要么它同时转换第二列和第三列，但不能同时转换（类似于： 1,0,add 2,1,more 1,2,thinking 3,0,to 1,2,me 1,a,0 2,b,1 1,c,

我有一个像这样的csv文件

1,a,add
2,b,more
1,c,thinking
3,a,to
1,c,me

我想将其从类别功能转换为数字格式，如下所示：

1,0,0
2,1,1
1,2,2
3,0,3
1,2,4

但我有一个问题：我不能在python中读写文件时循环，要么它只转换第二列，而对第三列不做任何操作，要么它同时转换第二列和第三列，但不能同时转换（类似于：

1,0,add
2,1,more
1,2,thinking
3,0,to
1,2,me
1,a,0
2,b,1
1,c,2
3,a,3
1,c,4

这是我的代码：

def construction_liste(reader, col):
    liste_mot_different=[]
    for line in reader:
        if line[col] not in liste_mot_different:
            liste_mot_different.append(line[col])
    #print liste_mot_different
    return liste_mot_different


def table_correspondance(liste):
    table_correspondance = []
    for i, item in enumerate(liste):
        table_correspondance.append((item,i))
    print table_correspondance
    return table_correspondance

def replace_word2int(line, table_correspondance):
    new_line = []
    for item in line:
        liste_table= [y[0] for y in table_correspondance]
        if item not in liste_table:
            new_line.append(item)
        else:
            i = [y[0] for y in table_correspondance].index(item)
            new_line.append(str(table_correspondance[i][1]))
    new_line = ",".join( new_line )
    new_line += "\n"
    print new_line
    return new_line

input_file = sys.argv[1]
output_file = sys.argv[2]

i1 = open( input_file, 'rb' )
i2 = open( input_file, 'rb' )

reader1=csv.reader( i1 )
reader2=csv.reader( i2 )

o = open( output_file, 'w' )

sequence = [2,5]

for col in seq:
    liste_mot_different=construction_liste(reader1,col)
    table_correspondance_word2int=table_correspondance(liste_mot_different)

    for line in reader2:
        nouvelle_ligne = replace_word2int(line, table_correspondance_word2int)
        o.write( nouvelle_ligne )



i1.close()
i2.close()
o.close()

在第一个函数中，我查看同一列的每一行，并为所有行中的同一列构造不同字符串的列表在第二个函数中，我构造了一个“映射表”，将一个整数映射到前面构造的列表中的一个字符串在第三个函数中，我用相应的整数逐列替换文件中的所有字符串

它确实适用于我序列中的第一列，但不适用于其他列，我认为这是因为一旦它读取了所有行，它就不会再这样做了

我该怎么做？谢谢

我也尝试过将“

与[…]一起用作”

，但它也不起作用

我发现跟随代码有点困难，但“我认为这是因为一旦它读取了所有行，就不会再这样做了？”是正确的。问题是，你应该使用

csv.writer（）

而不是

new_line=“，”。加入（新行）

etc.构造一个嵌套列表；

new\u csv\u data=[[row1col1，row1col2，row1col3]，[row2col1，row2col2，row3col3]，…]

。然后你就可以

writerows

。但是，在当前的设置中，我不清楚如何在

replace\u word2int

中将其聚合到单个列表中。你也可以去掉

列

，只读取每一行而不指定索引。然后，你只需迭代输入文件一次，就可以迭代生成的列表你想做多少次都行。你会得到一个列表结构，就像我在前面的评论中推荐的那样，即每一行都是嵌套列表，每一个元素都代表该行中的一列。啊，我理解你现在所做的事情。要完成这项工作，我不理解的一点是；

I=[y[0]代表表\u correspondance]中的y]。索引（项）

。你是说如果你的样本中有一个重复的单词，而不是

1,2,3,4,5

，那么理论上你可以使用

1,2,3,2,5

，而不是

？换句话说，如果你是第一次遇到这个单词，那么替换该单词的值应该是该单词的索引，而不是下一个联合国大学的索引sed号码？