Python 如何在pandas中进行关键字映射

Python 如何在pandas中进行关键字映射,python,pandas,word2vec,Python,Pandas,Word2vec,我有关键字 India Japan United States Germany China 下面是示例数据帧 id Address 1 Chome-2-8 Shibakoen, Minato, Tokyo 105-0011, Japan 2 Arcisstraße 21, 80333 München, Germany 3 Liberty Street, Manhattan, New York, United States 4 30 Shuangqing

我有关键字

India
Japan
United States
Germany
China
下面是示例数据帧

id    Address 
1     Chome-2-8 Shibakoen, Minato, Tokyo 105-0011, Japan
2     Arcisstraße 21, 80333 München, Germany
3     Liberty Street, Manhattan, New York, United States
4     30 Shuangqing Rd, Haidian Qu, Beijing Shi, China
5     Vaishnavi Summit,80feet Road,3rd Block,Bangalore, Karnataka, India
我的目标是制造

id    Address                                                          India Japan United States  Germany China    
1     Chome-2-8 Shibakoen, Minato, Tokyo 105-0011, Japan              0     1     0              0       0                  
2     Arcisstraße 21, 80333 München, Germany                          0     0     0              1       0
3     Liberty Street, Manhattan, New York, USA                        0     0     1              0       0
4     30 Shuangqing Rd, Haidian Qu, Beijing Shi, China                0     0     0              0       1
5     Vaishnavi Summit,80feet Road,Bangalore, Karnataka, India        1     0     0              0       0
基本思想是创建关键字检测器,我想使用
str.contain
word2vec
,但我无法得到逻辑

In [58]: df = df.join(df.Address.str.extract(r'.*,(.*)', expand=False).str.get_dummies())

In [59]: df
Out[59]:
   id                                            Address   China   Germany   India   Japan   United States
0   1  Chome-2-8 Shibakoen, Minato, Tokyo 105-0011, J...       0         0       0       1               0
1   2             Arcisstra?e 21, 80333 Munchen, Germany       0         1       0       0               0
2   3  Liberty Street, Manhattan, New York, United St...       0         0       0       0               1
3   4   30 Shuangqing Rd, Haidian Qu, Beijing Shi, China       1         0       0       0               0
4   5  Vaishnavi Summit,80feet Road,3rd Block,Bangalo...       0         0       1       0               0

注意:如果国家/地区不在
地址
列的最后一个位置,或者如果国家/地区名称包含
请使用
pd.get\u dummies()

此外,最直接的方法是将国家列在一个列表中,并使用for循环,比如

countries = ['India','Japan','United States','Germany','China']
for c in countries:
    df[c] = df.Address.str.contains(c) * 1

但是如果你有很多数据和国家,速度可能会很慢。

我正在打电话。我从头顶回答。你能确认我的答案有效吗?这是str.find的ufunc。我可以使用跨地址和关键字的广播。如果找到关键字,则返回位置。否则返回-1。因此>=0谢谢你。我会在几个小时后把它修好,等我回到电脑前。@piRSquared,对不起,我瞎了。我没有从numpy.core.defchararray import find中看到
-现在它按预期工作:)不确定您是否看到了这个。我特别为它感到骄傲(-:这有一个拼写错误,不能运行。
from numpy.core.defchararray import find

kw = 'India|Japan|United States|Germany|China'.split('|')
a = df.Address.values.astype(str)[:, None]

df.join(
    pd.DataFrame(
        find(a, kw) >= 0,
        df.index, kw,
        dtype=int
    )
)

   id                        Address  India  Japan  United States  Germany  China
0   1  Chome-2-8 Shibakoen, Minat...      0      1              0        0      0
1   2  Arcisstraße 21, 80333 Münc...      0      0              0        1      0
2   3  Liberty Street, Manhattan,...      0      0              1        0      0
3   4  30 Shuangqing Rd, Haidian ...      0      0              0        0      1
4   5  Vaishnavi Summit,80feet Ro...      1      0              0        0      0
from numpy.core.defchararray import find

kw = 'India|Japan|United States|Germany|China'.split('|')
a = df.Address.values.astype(str)[:, None]

df.join(
    pd.DataFrame(
        find(a, kw) >= 0,
        df.index, kw,
        dtype=int
    )
)

   id                        Address  India  Japan  United States  Germany  China
0   1  Chome-2-8 Shibakoen, Minat...      0      1              0        0      0
1   2  Arcisstraße 21, 80333 Münc...      0      0              0        1      0
2   3  Liberty Street, Manhattan,...      0      0              1        0      0
3   4  30 Shuangqing Rd, Haidian ...      0      0              0        0      1
4   5  Vaishnavi Summit,80feet Ro...      1      0              0        0      0