python:作为DataFrame单个值的整数列表
问题: 如何“pd.read_csv”,使给定列中的值为类型的列表(列中每行的列表) 创建数据帧时(从dict,见下文),单个值的类型为list。问题是:在将数据帧写入文件并从文件读回数据帧后,我得到的是字符串而不是列表python:作为DataFrame单个值的整数列表,python,list,pandas,Python,List,Pandas,问题: 如何“pd.read_csv”,使给定列中的值为类型的列表(列中每行的列表) 创建数据帧时(从dict,见下文),单个值的类型为list。问题是:在将数据帧写入文件并从文件读回数据帧后,我得到的是字符串而不是列表 import pandas as pd dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"], "neg": [[58], [1332, 753, 716, 782], [187]],
import pandas as pd
dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
"neg": [[58], [1332, 753, 716, 782], [187]],
"pos": [[96], [659, 661, 705, 1228], [1414]]}
df = pd.DataFrame(dict2df)
创建数据帧
import pandas as pd
dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
"neg": [[58], [1332, 753, 716, 782], [187]],
"pos": [[96], [659, 661, 705, 1228], [1414]]}
df = pd.DataFrame(dict2df)
值是一个列表
写入文件
df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)
从文件中读取
df = pd.read_csv('DataFrame.txt', sep='\t')
值是字符串而不是列表
当然,可以在这两种数据类型之间进行转换,但计算成本很高,需要额外的工作(见下文)
什么是更好的(更具pythonic)解决方案?迭代列表中的整数将非常方便,无需来回转换它们。
谢谢你的支持 您可以使用将字符串转换为列表
ast.literal\u eval()的一个简单示例-
演示-
In [15]: import pandas as pd
In [16]: dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
....: "neg": [[58], [1332, 753, 716, 782], [187]],
....: "pos": [[96], [659, 661, 705, 1228], [1414]]}
In [17]: df = pd.DataFrame(dict2df)
In [18]: df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)
In [19]: newdf = pd.read_csv('DataFrame.txt', sep='\t')
In [20]: newdf['neg']
Out[20]:
0 [58]
1 [1332, 753, 716, 782]
2 [187]
Name: neg, dtype: object
In [21]: newdf['neg'][0]
Out[21]: '[58]'
In [22]: import ast
In [23]: newdf['neg_list'] = newdf['neg'].apply(ast.literal_eval)
In [24]: newdf = newdf.drop('neg',axis=1)
In [25]: newdf['pos_list'] = newdf['pos'].apply(ast.literal_eval)
In [26]: newdf = newdf.drop('pos',axis=1)
In [27]: newdf
Out[27]:
euNOG neg_list pos_list
0 ENOG410IF52 [58] [96]
1 KOG2956 [1332, 753, 716, 782] [659, 661, 705, 1228]
2 KOG1997 [187] [1414]
In [28]: newdf['neg_list'][0]
Out[28]: [58]
非常感谢。从使用库而不是我自己的代码的意义上说,这更像是python。我希望找到一个解决方案,在读取数据帧时包含这种转换。
def convert_StringList2ListOfInt(string2convert):
return [int(ele) for ele in string2convert[1:-1].split(',')]
def DataFrame_StringOfInts2ListOfInts(df, cols2convert_list):
for column in cols2convert_list:
column_temp = column + "_temp"
df[column_temp] = df[column].apply(convert_StringList2ListOfInt, 1)
df[column] = df[column_temp]
df = df.drop(column_temp, axis=1)
return df
df = DataFrame_StringOfInts2ListOfInts(df, ['neg', 'pos'])
>>> import ast
>>> l = ast.literal_eval('[10,20,30]')
>>> type(l)
<class 'list'>
df = pd.read_csv('DataFrame.txt', sep='\t')
import ast
df['neg_list'] = df['neg'].apply(ast.literal_eval)
df = df.drop('neg',axis=1)
df['pos_list'] = df['pos'].apply(ast.literal_eval)
df = df.drop('pos',axis=1)
In [15]: import pandas as pd
In [16]: dict2df = {"euNOG": ["ENOG410IF52", "KOG2956", "KOG1997"],
....: "neg": [[58], [1332, 753, 716, 782], [187]],
....: "pos": [[96], [659, 661, 705, 1228], [1414]]}
In [17]: df = pd.DataFrame(dict2df)
In [18]: df.to_csv('DataFrame.txt', sep='\t', header=True, index=False)
In [19]: newdf = pd.read_csv('DataFrame.txt', sep='\t')
In [20]: newdf['neg']
Out[20]:
0 [58]
1 [1332, 753, 716, 782]
2 [187]
Name: neg, dtype: object
In [21]: newdf['neg'][0]
Out[21]: '[58]'
In [22]: import ast
In [23]: newdf['neg_list'] = newdf['neg'].apply(ast.literal_eval)
In [24]: newdf = newdf.drop('neg',axis=1)
In [25]: newdf['pos_list'] = newdf['pos'].apply(ast.literal_eval)
In [26]: newdf = newdf.drop('pos',axis=1)
In [27]: newdf
Out[27]:
euNOG neg_list pos_list
0 ENOG410IF52 [58] [96]
1 KOG2956 [1332, 753, 716, 782] [659, 661, 705, 1228]
2 KOG1997 [187] [1414]
In [28]: newdf['neg_list'][0]
Out[28]: [58]