Python 如何显示其类别中的所有关键字?
这是我在bank1.txt文件中的数据集Python 如何显示其类别中的所有关键字?,python,pandas,dataframe,nlp,Python,Pandas,Dataframe,Nlp,这是我在bank1.txt文件中的数据集 Keyword:Category ccn:fintech credit:fintech smart:fintech 这是我在bank2.txt文件中的数据集 Keyword:Category mcm:mcm switching:switching pul-sim:pulsa transfer:transfer debit sms:money transfer 我想做什么 Keyword Category_all mcm
Keyword:Category
ccn:fintech
credit:fintech
smart:fintech
这是我在bank2.txt文件中的数据集
Keyword:Category
mcm:mcm
switching:switching
pul-sim:pulsa
transfer:transfer
debit sms:money transfer
我想做什么
Keyword Category_all
mcm mcm
switching switching
pul-sim pulsa
transfer transfer
debit sms money transfer
ccn fintech
credit fintech
smart fintech
我所做的是
with open('entity_dict.txt') as f: //bank.txt
content = f.readlines()
content = [x.strip() for x in content ]
def ambil(inp):
try:
out = []
for x in content:
if x in inp:
out.append(x)
if len(out) == 0:
return 'other'
else:
output = ' '.join(out)
return output
except:
return 'other'
frame_institution['Keyword'] = frame_institution['description'].apply(ambil)
fintech = pd.read_csv('bank.txt', sep=":")
frame_Keyword = pd.merge(frame_institution, fintech, on='Keyword')
对于bank2.txt,代码为
with open('entity_dict2.txt') as f:
content2 = f.readlines()
content2 = [x.strip() for x in content2 ]
def ambil2(inp):
try:
out = []
for x in content2:
if x in inp:
out.append(x)
if len(out) == 0:
return 'other'
else:
output = ' '.join(out)
return output
except:
return 'other'
frame_institution['Keyword2'] = frame_institution['description'].apply(ambil2)
fintech2 = pd.read_csv('bank2.txt', sep=":")
frame_Keyword2 = pd.merge(frame_institution, fintech, on='Keyword')
frame_Keyword2 = pd.merge(frame_Keyword2, fintech2, on='Keyword2')
然后我会过滤一些关键词:
frame_Keyword2[frame_Keyword2['category_all'] == 'pulsa']
实际结果是:
Keyword Category_all
mcm mcm
switching switching
ccn fintech
credit fintech
smart fintech
但是类别中没有出现'pulsa'
,'transfer'
,'money transfer'
。我想有更好的办法解决这个问题
`只需尝试合并:
数据帧1:
>>> df1
Keyword Category
0 ccn fintech
1 credit fintech
2 smart fintech
>>> df2
Keyword Category
0 mcm mcm
1 switching switching
2 pul-sim pulsa
3 transfer transfer
4 debit sms money transfer
数据帧2:
>>> df1
Keyword Category
0 ccn fintech
1 credit fintech
2 smart fintech
>>> df2
Keyword Category
0 mcm mcm
1 switching switching
2 pul-sim pulsa
3 transfer transfer
4 debit sms money transfer
结果,合并外部
>>> pd.merge(df1, df2, how='outer')
Keyword Category
0 ccn fintech
1 credit fintech
2 smart fintech
3 mcm mcm
4 switching switching
5 pul-sim pulsa
6 transfer transfer
7 debit sms money transfer
如果有人为了类似的查询而挂在这里,下面添加的另一个解决方案只是为了子孙后代:
使用DataFrame.append()
方法:
df1.append(df2, ignore_index=True)
与pd.concat()一起
或者创建一个农场,然后进行康卡特:
frames = [df1,df2]
pd.concat(frames, ignore_index=True)
只需尝试合并:
数据帧1:
>>> df1
Keyword Category
0 ccn fintech
1 credit fintech
2 smart fintech
>>> df2
Keyword Category
0 mcm mcm
1 switching switching
2 pul-sim pulsa
3 transfer transfer
4 debit sms money transfer
数据帧2:
>>> df1
Keyword Category
0 ccn fintech
1 credit fintech
2 smart fintech
>>> df2
Keyword Category
0 mcm mcm
1 switching switching
2 pul-sim pulsa
3 transfer transfer
4 debit sms money transfer
结果,合并外部
>>> pd.merge(df1, df2, how='outer')
Keyword Category
0 ccn fintech
1 credit fintech
2 smart fintech
3 mcm mcm
4 switching switching
5 pul-sim pulsa
6 transfer transfer
7 debit sms money transfer
如果有人为了类似的查询而挂在这里,下面添加的另一个解决方案只是为了子孙后代:
使用DataFrame.append()
方法:
df1.append(df2, ignore_index=True)
与pd.concat()一起
或者创建一个农场,然后进行康卡特:
frames = [df1,df2]
pd.concat(frames, ignore_index=True)
尝试使用df1=pd.read\u csv('bank1.txt',sep=':')
读取您的数据集,与其他数据集类似。然后使用重复链接。我已更正了您的缩进,请验证。尝试使用df1=pd.read\u csv('bank1.txt',sep=':')
读取您的数据集,类似地,其他数据集也是如此。然后使用重复链接。我已更正您的缩进,请验证。