如何组合Python数据帧'；s具有唯一计数列表的信息输出_Python_Pandas_Dataframe_Dictionary_Unique

如何组合Python数据帧'；s具有唯一计数列表的信息输出

python pandas dataframe dictionary

如何组合Python数据帧'；s具有唯一计数列表的信息输出,python,pandas,dataframe,dictionary,unique,Python,Pandas,Dataframe,Dictionary,Unique,我像这样将CSV读入数据框，然后运行info（）：它产生： <class 'pandas.core.frame.DataFrame'> RangeIndex: 504334 entries, 0 to 504333 Data columns (total 288 columns): Unnamed: 0 int64 rowno__loan float64 ..... the rest of the 288 features

我像这样将CSV读入数据框，然后运行

info（）

：

它产生：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504334 entries, 0 to 504333
Data columns (total 288 columns):
Unnamed: 0                 int64
rowno__loan               float64
..... the rest of the 288 features

得到了这个，太棒了

{'Unnamed: 0': 504334,
 'rowno_loan': 55851,
.. the rest of the 288 pairs..

我想将详细的

信息

输出与

唯一计数

输出结合起来。显然，这是一个简单的一对一唯一的内部联接，在匹配键中没有重复

如何调用两个表中要素名称的列？两者似乎都没有列名

此外，是否可以直接将计数写入数据帧而不是字典？

多谢各位

熊猫有一种内置的计算唯一值的方法。您可以使用

dlqcsv.nunique（）获取输出
对于您要求的整个任务，操纵df.info
是一项更为困难的任务。一个更简单的选择是使用下面的代码来计算列的所有必需值

output = []

for col in df.columns:

    nonNull  = len(df) - np.sum(pd.isna(df[col]))
    unique = df[col].nunique()
    colType = str(df[col].dtype)

    output.append([col, nonNull, unique, colType])

output = pd.DataFrame(output)   
output.columns = ['colName','non-null values', 'unique', 'dtype']



输出如下所示：
     colName  non-null values  unique    dtype
0      le_id               20       5    int64
1    run_seq               20       5    int64
2      cp_id               20       8    int64
3    cp_name               20       8   object
4   products               20       7   object
5  tran_amnt               20      17    int64
6   currency               20       6   object
7    current                1       1  float6

谢谢你，罗珊。成功了。Agree df.info（）主要用于查看而不是操纵。不知道（对任何人来说都没有什么）我们是否可以将df.info（）变形为更有用的。如果不打开“详细”，则没有什么意义查看它。启用verbose也不会增加价值，因为它不利于操作。旁注：当框架适合Linux上的内存时，nunique的运行速度确实要快得多，超过了SAS数据步骤。在我的测试中，当#of columns>40000时，内存是否不合适？它仍然优于SAS DS，但在计数方面落后于SAS DS2。交换磁盘空间的效率似乎不高。

output = []

for col in df.columns:

    nonNull  = len(df) - np.sum(pd.isna(df[col]))
    unique = df[col].nunique()
    colType = str(df[col].dtype)

    output.append([col, nonNull, unique, colType])

output = pd.DataFrame(output)   
output.columns = ['colName','non-null values', 'unique', 'dtype']



     colName  non-null values  unique    dtype
0      le_id               20       5    int64
1    run_seq               20       5    int64
2      cp_id               20       8    int64
3    cp_name               20       8   object
4   products               20       7   object
5  tran_amnt               20      17    int64
6   currency               20       6   object
7    current                1       1  float6