Python 如何跨行获取特定字符串的计数?
我的数据框架如下: 我想将数据帧的新列中的D、T和N的计数作为Dcount TCount NcountPython 如何跨行获取特定字符串的计数?,python,pandas,Python,Pandas,我的数据框架如下: 我想将数据帧的新列中的D、T和N的计数作为Dcount TCount Ncount data = {'CHROM':['chr1', 'chr2', 'chr1', 'chr3', 'chr1','chr1', 'chr2', 'chr1'], 'POS':[939570,3411794,1043223,22511093,24454031,3411794,22511093,1043223], 'MI':['T', 'T', 'D', 'D',
data = {'CHROM':['chr1', 'chr2', 'chr1', 'chr3', 'chr1','chr1', 'chr2', 'chr1'],
'POS':[939570,3411794,1043223,22511093,24454031,3411794,22511093,1043223],
'MI':['T', 'T', 'D', 'D', 'T', 'N', 'D', 'N'],
'CSK':['D', 'D', 'N', 'T', 'N', 'D', 'T', 'T'],
'DD':['N', 'D', 'D', 'D', 'T', 'N', 'D', 'N'],
'RR':['D', 'T', 'N', 'T', 'D', 'D', 'T', 'N'],
'RCB':['D', 'D', 'D', 'D', 'D', 'D', 'D', 'D'],
'DC':['D', 'D', 'T', 'D', 'D', 'D', 'N', 'D']
}
df1 = pd.DataFrame(data)
df1
我想在新的数据帧中获得T
,D
,N
的计数
预期产出:
CHROM POS MI CSK DD RR RCB DC Dcount Tcount Ncount
0 chr1 939570 T D N D D D 4 1 1
1 chr2 3411794 T D D T D D 4 2 0
2 chr1 1043223 D N D N D T 3 1 2
3 chr3 22511093 D T D T D D 4 2 0
4 chr1 24454031 T N T D D D 3 2 1
5 chr1 3411794 N D N D D D 4 0 2
6 chr2 22511093 D T D T D N 3 2 1
7 chr1 1043223 N T N N D D 2 1 3
用于选择从2到数据帧末尾的所有列,计数值按,将缺少的值重新拼凑到0
,然后使用并附加到原始值按:
或与和一起使用:
不幸的是,这是一个骗局。请检查链接问题。它有你的两个答案。@MayankPorwal-部分是重复的,在链接的Q/A中,现在的问题是misisng,通过
iloc
过滤掉前两列,并通过join
追加。这个问题已经存在。
CHROM POS MI CSK DD RR RCB DC Dcount Tcount Ncount
0 chr1 939570 T D N D D D 4 1 1
1 chr2 3411794 T D D T D D 4 2 0
2 chr1 1043223 D N D N D T 3 1 2
3 chr3 22511093 D T D T D D 4 2 0
4 chr1 24454031 T N T D D D 3 2 1
5 chr1 3411794 N D N D D D 4 0 2
6 chr2 22511093 D T D T D N 3 2 1
7 chr1 1043223 N T N N D D 2 1 3
df1 = (df1.join(df1.iloc[:, 2:]
.apply(pd.value_counts, axis=1)
.fillna(0)
.astype(int)
.add_suffix('count')))
print (df1)
CHROM POS MI CSK DD RR RCB DC Dcount Ncount Tcount
0 chr1 939570 T D N D D D 4 1 1
1 chr2 3411794 T D D T D D 4 0 2
2 chr1 1043223 D N D N D T 3 2 1
3 chr3 22511093 D T D T D D 4 0 2
4 chr1 24454031 T N T D D D 3 1 2
5 chr1 3411794 N D N D D D 4 2 0
6 chr2 22511093 D T D T D N 3 1 2
7 chr1 1043223 N T N N D D 2 3 1
df1 = df1.join(df1.iloc[:, 2:]
.stack()
.groupby(level=0)
.value_counts()
.unstack(fill_value=0)
.add_suffix('count'))
print (df1)
CHROM POS MI CSK DD RR RCB DC Dcount Ncount Tcount
0 chr1 939570 T D N D D D 4 1 1
1 chr2 3411794 T D D T D D 4 0 2
2 chr1 1043223 D N D N D T 3 2 1
3 chr3 22511093 D T D T D D 4 0 2
4 chr1 24454031 T N T D D D 3 1 2
5 chr1 3411794 N D N D D D 4 2 0
6 chr2 22511093 D T D T D N 3 1 2
7 chr1 1043223 N T N N D D 2 3 1