Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用多维数组从字典创建pd.DataFrame_Python_Python 3.x_Pandas_Dataframe - Fatal编程技术网

Python 使用多维数组从字典创建pd.DataFrame

Python 使用多维数组从字典创建pd.DataFrame,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有以下字典: dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]], 'B': [[4, 4, 4], [4, 4, 4],], 'C': [[4, 6, 0]] } 我想将其转换为一个pd.DataFrame(),应为: id ColA ColB ColC 0 1 4 4 1 2

我有以下字典:

dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
         'B': [[4, 4, 4], [4, 4, 4],],
         'C': [[4, 6, 0]]
        }
我想将其转换为一个
pd.DataFrame()
,应为:

id       ColA        ColB        ColC
0         1           4           4
1         2           4           6
2         3           4           0
3         1           4           
4         2           4
5         3           4
6         1
7         2
8         3
我该怎么做? 我正在努力

pd.DataFrame(dictAll.items(), columns=['ColA', 'ColB', 'ColC'])
但它显然不起作用

以下是如何:

import pandas as pd
import numpy as np

dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
         'B': [[4, 4, 4], [4, 4, 4],],
         'C': [[4, 6, 0]]}

df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
print(df)
输出:

   ColA ColB ColC
0     1    4    4
1     2    4    6
2     3    4    0
3     1    4     
4     2    4     
5     3    4     
6     1          
7     2          
8     3  
           A          B          C
0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
1  [1, 2, 3]  [4, 4, 4]        NaN
2  [1, 2, 3]        NaN        NaN
   A    B    C
0  1  4.0  4.0
1  2  4.0  6.0
2  3  4.0  0.0
3  1  4.0  NaN
4  2  4.0  NaN
5  3  4.0  NaN
6  1  NaN  NaN
7  2  NaN  NaN
8  3  NaN  NaN
   A  B  C
0  1  4  4
1  2  4  6
2  3  4  0
3  1  4   
4  2  4   
5  3  4   
6  1      
7  2      
8  3     
   ColA ColB ColC
0     1    4    4
1     2    4    6
2     3    4    0
3     1    4     
4     2    4     
5     3    4     
6     1          
7     2          
8     3  

现在,让我们一步一步地来看看这个问题

  • 我们可以尝试的第一件事就是:

    df = pd.DataFrame(dictA)
    print(df)
    
    当然,返回以下错误:

     ValueError: arrays must all be same length
    
  • 因此,现在我们需要一种能够从具有不同长度数组的
    dict
    创建数据帧的方法。为此,我们可以:

    df = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in dictA.items()]))
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
  • 我们希望数据帧是垂直的,因此对于每个迭代,使用列表理解将列表展平:

    df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()]))
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
  • 现在我们想用空格替换所有的
    NaN
    s。为此,我们需要
    将numpy导入为np
    ,并执行以下操作:

    df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()])).replace(np.nan, '')
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
  • 最后,使用格式化字符串将字母转换为
    “Col”
    字母:

    df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
  • 以下是如何:

    import pandas as pd
    import numpy as np
    
    dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
             'B': [[4, 4, 4], [4, 4, 4],],
             'C': [[4, 6, 0]]}
    
    df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    

    现在,让我们一步一步地来看看这个问题

  • 我们可以尝试的第一件事就是:

    df = pd.DataFrame(dictA)
    print(df)
    
    当然,返回以下错误:

     ValueError: arrays must all be same length
    
  • 因此,现在我们需要一种能够从具有不同长度数组的
    dict
    创建数据帧的方法。为此,我们可以:

    df = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in dictA.items()]))
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
  • 我们希望数据帧是垂直的,因此对于每个迭代,使用列表理解将列表展平:

    df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()]))
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
  • 现在我们想用空格替换所有的
    NaN
    s。为此,我们需要
    将numpy导入为np
    ,并执行以下操作:

    df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()])).replace(np.nan, '')
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
  • 最后,使用格式化字符串将字母转换为
    “Col”
    字母:

    df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
    print(df)
    
    输出:

       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    
               A          B          C
    0  [1, 2, 3]  [4, 4, 4]  [4, 6, 0]
    1  [1, 2, 3]  [4, 4, 4]        NaN
    2  [1, 2, 3]        NaN        NaN
    
       A    B    C
    0  1  4.0  4.0
    1  2  4.0  6.0
    2  3  4.0  0.0
    3  1  4.0  NaN
    4  2  4.0  NaN
    5  3  4.0  NaN
    6  1  NaN  NaN
    7  2  NaN  NaN
    8  3  NaN  NaN
    
       A  B  C
    0  1  4  4
    1  2  4  6
    2  3  4  0
    3  1  4   
    4  2  4   
    5  3  4   
    6  1      
    7  2      
    8  3     
    
       ColA ColB ColC
    0     1    4    4
    1     2    4    6
    2     3    4    0
    3     1    4     
    4     2    4     
    5     3    4     
    6     1          
    7     2          
    8     3  
    

  • 您可以使用
    pd.DataFrame.from_dict(dictA)
    ,但dict上的数组长度必须全部相同。您可以使用
    pd.DataFrame.from_dict(dictA)
    ,但dict上的数组长度必须全部相同。