Python 在dataframe中将列表元素拆分为子元素
我有一个数据帧:-Python 在dataframe中将列表元素拆分为子元素,python,arrays,python-3.x,pandas,Python,Arrays,Python 3.x,Pandas,我有一个数据帧:- Filtered_data ['defence possessed russia china','factors driving china modernise'] ['force bolster pentagon','strike capabilities pentagon congress detailing china'] [missiles warheads', 'deterrent face continued advances'] ...... ......
Filtered_data
['defence possessed russia china','factors driving china modernise']
['force bolster pentagon','strike capabilities pentagon congress detailing china']
[missiles warheads', 'deterrent face continued advances']
......
......
我只想将每个列表元素拆分为子元素(标记化的单词)。因此,我要查找的输出为:-
Filtered_data
[defence, possessed,russia,factors,driving,china,modernise]
[force,bolster,strike,capabilities,pentagon,congress,detailing,china]
[missiles,warheads, deterrent,face,continued,advances]
这是我的代码,我已经试过了
for text in df['Filtered_data'].iteritems():
for i in text.split():
print (i)
将列表理解与
split
和flatening一起使用:
df['Filtered_data'] = df['Filtered_data'].apply(lambda x: [z for y in x for z in y.split()])
print (df)
Filtered_data
0 [defence, possessed, russia, china, factors, d...
1 [force, bolster, pentagon, strike, capabilitie...
2 [missiles, warheads, deterrent, face, continue...
编辑:
对于唯一值,标准方法是使用set
s:
df['Filtered_data'] = df['Filtered_data'].apply(lambda x: list(set([z for y in x for z in y.split()])))
print (df)
Filtered_data
0 [russia, factors, defence, driving, china, mod...
1 [capabilities, detailing, china, force, pentag...
2 [deterrent, advances, face, warheads, missiles...
但如果值的顺序很重要,请使用:
您可以使用
itertools.chain
+。toolz.unique
与set
相比的优点是它保留了顺序
from itertools import chain
from toolz import unique
df = pd.DataFrame({'strings': [['defence possessed russia china','factors driving china modernise'],
['force bolster pentagon','strike capabilities pentagon congress detailing china'],
['missiles warheads', 'deterrent face continued advances']]})
df['words'] = df['strings'].apply(lambda x: list(unique(chain.from_iterable(i.split() for i in x))))
print(df.iloc[0]['words'])
['defence', 'possessed', 'russia', 'china', 'factors', 'driving', 'modernise']
为什么投反对票?我是python新手。对不起,如果在这里问一个愚蠢的问题,那么反对票不是因为这个问题愚蠢(事实并非如此),而是因为。我们必须猜测您的数据结构,这使问题变得模棱两可。另一个原因是您需要将代码添加到问题中,您可以尝试…@James-仅添加
set
类似list(set([z代表y in x代表z in y.split()]))
需要您的帮助:-https://stackoverflow.com/questions/51574485/match-keywords-in-pandas-column-with-another-list-of-elements
。我没有提到解决方案
from itertools import chain
from toolz import unique
df = pd.DataFrame({'strings': [['defence possessed russia china','factors driving china modernise'],
['force bolster pentagon','strike capabilities pentagon congress detailing china'],
['missiles warheads', 'deterrent face continued advances']]})
df['words'] = df['strings'].apply(lambda x: list(unique(chain.from_iterable(i.split() for i in x))))
print(df.iloc[0]['words'])
['defence', 'possessed', 'russia', 'china', 'factors', 'driving', 'modernise']