Python 如何拆分包含字符串的列
我有一个数据框,如果出现Python 如何拆分包含字符串的列,python,pandas,Python,Pandas,我有一个数据框,如果出现\uu,则需要拆分列 Name = [('Hello'), ('Spider'), ('Captain'), ('Superman'), ('Hello_1'), ('Superman_1')] dfName = pd.DataFrame(Name, columns=['Name']) 我的 Name 0 Hello 1 Spider 2 Captain 3 Superman 4
\uu
,则需要拆分列
Name = [('Hello'),
('Spider'),
('Captain'),
('Superman'),
('Hello_1'),
('Superman_1')]
dfName = pd.DataFrame(Name, columns=['Name'])
我的
Name
0 Hello
1 Spider
2 Captain
3 Superman
4 Hello_1
5 Superman_1
预料之外
df1
df2
通过将不包含的掩码反转为~
用于df1
,将不包含的掩码反转为df2
,用于掩码和过滤器。默认值为上次添加的RangeIndex
:
m = dfName['Name'].str.contains('_')
#is sample data .reset_index(drop=True) not necessary, added for general solution
df1 = dfName[~m].reset_index(drop=True)
print(df1)
Name
0 Hello
1 Spider
2 Captain
3 Superman
df2 = dfName[m].reset_index(drop=True)
print(df2)
Name
0 Hello_1
1 Superman_1
您可能需要首先将第一个列表拆分为两个子列表:
>>> name = 'Hello Spider Captain Superman Hello_1 Superman_1'.split()
>>> name
['Hello', 'Spider', 'Captain', 'Superman', 'Hello_1', 'Superman_1']
>>> col1 = [n for n in name if '_' not in n]
>>> col2 = [n for n in name if '_' in n]
>>> col1
['Hello', 'Spider', 'Captain', 'Superman']
>>> col2
['Hello_1', 'Superman_1']
>>>
注意:每个约定变量应该是小写的,以区别于类。
您可以使用此代码拆分数据帧:
df1 = dfName[~dfName["Name"].str.contains('_1', na=False)].reset_index(drop=True)
df2 = dfName[dfName["Name"].str.contains('_1', na=False)].reset_index(drop=True)
df1的输出:
Name
0 Hello
1 Spider
2 Captain
3 Superman
df2的输出:
Name
0 Hello_1
1 Superman_1
如果要删除索引,请添加.reset_index(drop=True)可能重复的
Name
0 Hello
1 Spider
2 Captain
3 Superman
Name
0 Hello_1
1 Superman_1
dfnamewithout_regex = dfName[~dfName['Name'].str.contains('_')]
dfnamewithout_regex
Name
0 Hello
1 Spider
2 Captain
3 Superman
dfnamewith_regex = dfName[dfName['Name'].str.contains('_')]
dfnamewith_regex
Name
4 Hello_1
5 Superman_1