Python '；扩展'；将Dataframe列的内容转换为新列_Python_Pandas_Dataframe

Python '；扩展'；将Dataframe列的内容转换为新列

python pandas dataframe

Python '；扩展'；将Dataframe列的内容转换为新列,python,pandas,dataframe,Python,Pandas,Dataframe,我敢肯定，在不使用嵌套循环的情况下，一定有办法做到这一点我有一个df（注意，有一列包含字符串列表）最后，我想“展开”列中列表中的值，以便每个可能的列表项都有一个col，如果该值出现，则每行的正确列中都有一个1。e、 g df = A B C a b c g h x y 5 1 ['a','b'] 1 1 6 2 ['b','c'] 1 1 3 3 ['g','h'] 1 1 4 5 ['x','y']

我敢肯定，在不使用嵌套循环的情况下，一定有办法做到这一点

我有一个df（注意，有一列包含字符串列表）

最后，我想“展开”列中列表中的值，以便每个可能的列表项都有一个col，如果该值出现，则每行的正确列中都有一个1。e、 g

df =

A  B      C      a  b  c  g  h  x  y
5  1  ['a','b']  1  1
6  2  ['b','c']     1  1
3  3  ['g','h']           1  1
4  5  ['x','y']                 1  1

您可以使用，但需要使用

groupby

columns

和aggregate

max

：

df1 = pd.get_dummies(pd.DataFrame(df.C.values.tolist()), prefix='', prefix_sep='')
        .groupby(axis=1, level=0).max()

df1 = pd.concat([df, df1], axis=1)
print (df1)

   A  B       C  a  b  c  g  h  x  y
0  5  1  [a, b]  1  1  0  0  0  0  0
1  6  2  [b, c]  0  1  1  0  0  0  0
2  3  3  [g, h]  0  0  0  1  1  0  0
3  4  5  [x, y]  0  0  0  0  0  1  1

另一个带+的解决方案：

也可以删除

，但可以使用数字和一些函数来获取字符串值：

df1 = df.C.astype(str).replace(['\[','\]', "'", "\s+"], '', regex=True).str.get_dummies(',')
df1 = df1.replace(0,'')
df1 = pd.concat([df, df1], axis=1)
print (df1)
   A  B       C  a  b  c  g  h  x  y
0  5  1  [a, b]  1  1               
1  6  2  [b, c]     1  1            
2  3  3  [g, h]           1  1      
3  4  5  [x, y]                 1  1

太好了！它起作用了。但是，有没有办法“就地”做呢。我试图操作的数据帧是~20GB

get_dummies

是一个复杂的函数，所以很遗憾我无法帮助您<代码>20GB确实很大。不过，谢谢。是的，太大了。我可以试着把它分解成小块。或者更聪明地使用它当前结构中的数据。

df1 = df.C.astype(str).replace(['\[','\]', "'", "\s+"], '', regex=True).str.get_dummies(',')
df1 = pd.concat([df, df1], axis=1)
print (df1)

   A  B       C  a  b  c  g  h  x  y
0  5  1  [a, b]  1  1  0  0  0  0  0
1  6  2  [b, c]  0  1  1  0  0  0  0
2  3  3  [g, h]  0  0  0  1  1  0  0
3  4  5  [x, y]  0  0  0  0  0  1  1

df1 = df.C.astype(str).replace(['\[','\]', "'", "\s+"], '', regex=True).str.get_dummies(',')
df1 = df1.replace(0,'')
df1 = pd.concat([df, df1], axis=1)
print (df1)
   A  B       C  a  b  c  g  h  x  y
0  5  1  [a, b]  1  1               
1  6  2  [b, c]     1  1            
2  3  3  [g, h]           1  1      
3  4  5  [x, y]                 1  1