Python数据帧_Python_Pandas_Dataframe

Python数据帧

python pandas dataframe

Python数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个df，看起来像这样： names col1 col2 col3 total total_col1 total_col2 bbb 1 1 0 2 DF1, DF2 DF1 ccc 1 0 0 1 DF1 zzz 0 1

我有一个df，看起来像这样：

names    col1   col2   col3   total     total_col1      total_col2
 bbb      1      1      0      2         DF1, DF2           DF1           
 ccc      1      0      0      1         DF1                        
 zzz      0      1      1      2                            DF2     
 qqq      0      1      0      1                           DF1, Df2
 rrr      0      0      1      1

我想计算每个

total#col

中的数字，然后加上wnother

full total col

，这样输出将是：

names    col1   col2   col3   total  total_full     total_col1      total_col2
 bbb      1      1      0      2          5              2             1   
 ccc      1      0      0      1          2              1                      
 zzz      0      1      1      2          3              1    
 qqq      0      1      0      1          3              2
 rrr      0      0      1      1

因此，每个

total col

对其中的DFs数求和，

total full

将这些col与

total

col求和

熊猫有可能吗？

您可以使用：

#filter columns for replacement
cols = df.columns[df.columns.str.startswith('total_')]
#split and get length of lists, write back
df[cols] = df[cols].apply(lambda x: x.str.split(',').str.len())
#add new column to position next total column
df.insert(df.columns.get_loc('total') + 1, 'total_full', df.filter(like='total').sum(axis=1))
print (df)
  names  col1  col2  col3  total  total_full  total_col1  total_col2
0   bbb     1     1     0      2         5.0         2.0         1.0
1   ccc     1     0     0      1         2.0         1.0         NaN
2   zzz     0     1     1      2         3.0         NaN         1.0
3   qqq     0     1     0      1         3.0         NaN         2.0
4   rrr     0     0     1      1         1.0         NaN         NaN

您可以使用：

#filter columns for replacement
cols = df.columns[df.columns.str.startswith('total_')]
#split and get length of lists, write back
df[cols] = df[cols].apply(lambda x: x.str.split(',').str.len())
#add new column to position next total column
df.insert(df.columns.get_loc('total') + 1, 'total_full', df.filter(like='total').sum(axis=1))
print (df)
  names  col1  col2  col3  total  total_full  total_col1  total_col2
0   bbb     1     1     0      2         5.0         2.0         1.0
1   ccc     1     0     0      1         2.0         1.0         NaN
2   zzz     0     1     1      2         3.0         NaN         1.0
3   qqq     0     1     0      1         3.0         NaN         2.0
4   rrr     0     0     1      1         1.0         NaN         NaN

你可以用

totals = df.filter(regex=r'^total_col')
counts = (totals.stack().str.count(',')+1).unstack()
#    total_col1  total_col2
# 0         2.0         1.0
# 1         1.0         NaN
# 2         NaN         1.0
# 3         NaN         2.0

计算总计列中的字符串数

要将非NaN值排序到每行的末尾，可以使用

counts_array = np.sort(counts.values, axis=1)
counts = pd.DataFrame(counts_array, columns=counts.columns, index=counts.index)

屈服

   col1  col2  col3 names  total  total_col1  total_col2  total_full
0     1     1     0   bbb      2         1.0         2.0         5.0
1     1     0     0   ccc      1         1.0         NaN         2.0
2     0     1     1   zzz      2         1.0         NaN         3.0
3     0     1     0   qqq      1         2.0         NaN         3.0
4     0     0     1   rrr      1         NaN         NaN         1.0

你可以用

totals = df.filter(regex=r'^total_col')
counts = (totals.stack().str.count(',')+1).unstack()
#    total_col1  total_col2
# 0         2.0         1.0
# 1         1.0         NaN
# 2         NaN         1.0
# 3         NaN         2.0

计算总计列中的字符串数

要将非NaN值排序到每行的末尾，可以使用

counts_array = np.sort(counts.values, axis=1)
counts = pd.DataFrame(counts_array, columns=counts.columns, index=counts.index)

屈服

   col1  col2  col3 names  total  total_col1  total_col2  total_full
0     1     1     0   bbb      2         1.0         2.0         5.0
1     1     0     0   ccc      1         1.0         NaN         2.0
2     0     1     1   zzz      2         1.0         NaN         3.0
3     0     1     0   qqq      1         2.0         NaN         3.0
4     0     0     1   rrr      1         NaN         NaN         1.0

我不明白什么是

DF1，DF2

和

DF2

@Dror这些只是我想要计数的字符串。我不明白什么是

DF1，DF2

和

DF2

@Dror这些只是我想要计数的字符串。如果单元格是空的，它仍然在

lambda

函数hmm中给我

，可能是必需的

df[cols]=df[cols].replace（“”，np.nan）.apply（lambda x:x.str.split（‘，’）.str.len（））

然后我得到一个错误

只能使用字符串值的str访问器，它在pandas中使用np.object dtype

真的有趣，你的pandas版本是什么？也许有一种方法我可以在lambda函数中使用，要调用另一个函数来检查字符串的类型，以及它是否包含

，

等？如果单元格为空，它仍然在

lambda

functionHmmm中提供

，可能需要

df[cols]=df[cols]。替换（“”，np.nan）。应用（lambda x:x.str.split（‘，’）.str len（））

然后我得到一个错误

只能使用字符串值的str访问器，它在pandas中使用np.object dtype

真的很有趣，你的pandas版本是什么？也许有一种方法我可以在lambda函数中使用，调用另一个函数来检查字符串的类型，以及它是否包含

，

等等。？