Python 如何显示其类别中的所有关键字?

Python 如何显示其类别中的所有关键字?,python,pandas,dataframe,nlp,Python,Pandas,Dataframe,Nlp,这是我在bank1.txt文件中的数据集 Keyword:Category ccn:fintech credit:fintech smart:fintech 这是我在bank2.txt文件中的数据集 Keyword:Category mcm:mcm switching:switching pul-sim:pulsa transfer:transfer debit sms:money transfer 我想做什么 Keyword Category_all mcm

这是我在bank1.txt文件中的数据集

Keyword:Category
ccn:fintech
credit:fintech
smart:fintech
这是我在bank2.txt文件中的数据集

Keyword:Category
mcm:mcm
switching:switching
pul-sim:pulsa
transfer:transfer
debit sms:money transfer
我想做什么

 Keyword     Category_all
 mcm           mcm
 switching     switching
 pul-sim       pulsa
 transfer      transfer
 debit sms     money transfer
 ccn           fintech
 credit        fintech
 smart         fintech
我所做的是

with open('entity_dict.txt') as f:  //bank.txt
    content = f.readlines() 
    content = [x.strip() for x in content ]

def ambil(inp):
    try:
        out = []
        for x in content:      
            if x in inp:
                out.append(x)

        if len(out) == 0:
            return 'other'
        else:
            output = ' '.join(out)
            return output

    except:
        return 'other'

frame_institution['Keyword'] = frame_institution['description'].apply(ambil)
fintech = pd.read_csv('bank.txt', sep=":")
frame_Keyword = pd.merge(frame_institution, fintech, on='Keyword')
对于bank2.txt,代码为

with open('entity_dict2.txt') as f: 
    content2 = f.readlines()
    content2 = [x.strip() for x in content2 ]

def ambil2(inp):
    try:
        out = []
        for x in content2:      
            if x in inp:
                out.append(x)

        if len(out) == 0:
            return 'other'
        else:
            output = ' '.join(out)
            return output
    except:
        return 'other'

frame_institution['Keyword2'] =   frame_institution['description'].apply(ambil2) 
fintech2 = pd.read_csv('bank2.txt', sep=":")
frame_Keyword2 = pd.merge(frame_institution, fintech, on='Keyword')
frame_Keyword2 = pd.merge(frame_Keyword2, fintech2, on='Keyword2')
然后我会过滤一些关键词:

frame_Keyword2[frame_Keyword2['category_all'] == 'pulsa'] 
实际结果是:

Keyword     Category_all
 mcm           mcm
 switching     switching
 ccn           fintech
 credit        fintech
 smart         fintech
但是
类别中没有出现
'pulsa'
'transfer'
'money transfer'
。我想有更好的办法解决这个问题

`

只需尝试合并:

数据帧1:

>>> df1
  Keyword Category
0     ccn  fintech
1  credit  fintech
2   smart  fintech
>>> df2
     Keyword        Category
0        mcm             mcm
1  switching       switching
2    pul-sim           pulsa
3   transfer        transfer
4  debit sms  money transfer
数据帧2:

>>> df1
  Keyword Category
0     ccn  fintech
1  credit  fintech
2   smart  fintech
>>> df2
     Keyword        Category
0        mcm             mcm
1  switching       switching
2    pul-sim           pulsa
3   transfer        transfer
4  debit sms  money transfer
结果,合并外部

>>> pd.merge(df1, df2, how='outer')
     Keyword        Category
0        ccn         fintech
1     credit         fintech
2      smart         fintech
3        mcm             mcm
4  switching       switching
5    pul-sim           pulsa
6   transfer        transfer
7  debit sms  money transfer
如果有人为了类似的查询而挂在这里,下面添加的另一个解决方案只是为了子孙后代:

使用
DataFrame.append()
方法:

df1.append(df2, ignore_index=True)
与pd.concat()一起

或者创建一个农场,然后进行康卡特:

frames = [df1,df2]
pd.concat(frames, ignore_index=True)
只需尝试合并:

数据帧1:

>>> df1
  Keyword Category
0     ccn  fintech
1  credit  fintech
2   smart  fintech
>>> df2
     Keyword        Category
0        mcm             mcm
1  switching       switching
2    pul-sim           pulsa
3   transfer        transfer
4  debit sms  money transfer
数据帧2:

>>> df1
  Keyword Category
0     ccn  fintech
1  credit  fintech
2   smart  fintech
>>> df2
     Keyword        Category
0        mcm             mcm
1  switching       switching
2    pul-sim           pulsa
3   transfer        transfer
4  debit sms  money transfer
结果,合并外部

>>> pd.merge(df1, df2, how='outer')
     Keyword        Category
0        ccn         fintech
1     credit         fintech
2      smart         fintech
3        mcm             mcm
4  switching       switching
5    pul-sim           pulsa
6   transfer        transfer
7  debit sms  money transfer
如果有人为了类似的查询而挂在这里,下面添加的另一个解决方案只是为了子孙后代:

使用
DataFrame.append()
方法:

df1.append(df2, ignore_index=True)
与pd.concat()一起

或者创建一个农场,然后进行康卡特:

frames = [df1,df2]
pd.concat(frames, ignore_index=True)

尝试使用
df1=pd.read\u csv('bank1.txt',sep=':')
读取您的数据集,与其他数据集类似。然后使用重复链接。我已更正了您的缩进,请验证。尝试使用
df1=pd.read\u csv('bank1.txt',sep=':')
读取您的数据集,类似地,其他数据集也是如此。然后使用重复链接。我已更正您的缩进,请验证。