Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将列表与列匹配并从列中提取匹配值_Python_Regex_String_Pandas - Fatal编程技术网

Python 将列表与列匹配并从列中提取匹配值

Python 将列表与列匹配并从列中提取匹配值,python,regex,string,pandas,Python,Regex,String,Pandas,我在匹配dataframe的列表和列以及从匹配中提取列中的特定匹配值时遇到问题 数据集: address 0 58 Chatham Street, Chatham, New Jersey, 07928 1 3420 W. MacArthur Blvd. Ste. C, Santa Ana, California 2 2016 Chalk Rd, Wake Forest, North Carolina, 27587 我有一份包含州名的清单 state = ['New York

我在匹配dataframe的列表和列以及从匹配中提取列中的特定匹配值时遇到问题

数据集:

    address
0   58 Chatham Street, Chatham, New Jersey, 07928
1   3420 W. MacArthur Blvd. Ste. C, Santa Ana, California
2   2016 Chalk Rd, Wake Forest, North Carolina, 27587
我有一份包含州名的清单

state = ['New York','New Jersey','California',...]
渴望结果

    address                                                   State
0   58 Chatham Street, Chatham, New Jersey, 07928             New Jersey
1   3420 W. MacArthur Blvd. Ste. C, Santa Ana, California     California
2   2016 Chalk Rd, Wake Forest, North Carolina, 27587         North Carolina
我试过的代码

for i in state:
    ship_add['state'] = ship_add['address'].str.strip(i)
使用:

如果匹配分割值:

state = ['New York','New Jersey','California','North Carolina']

df1 = df['address'].str.split(', ', expand=True)
df['State'] = df1.where(df1.isin(state)).ffill(1).iloc[:, -1]
print (df)
                                             address           State
0      58 Chatham Street, Chatham, New Jersey, 07928      New Jersey
1  3420 W. MacArthur Blvd. Ste. C, Santa Ana, Cal...      California
2  2016 Chalk Rd, Wake Forest, North Carolina, 27587  North Carolina
尝试:


这种方法对于更大的数据也会更快

嗨,耶斯雷尔,它给了我一个价值观。这是我的错。我的数据集有问题,因为我面临Nan。是的,现在开始工作了。感谢您可以基于逗号将值拆分为新的列,因为获取状态的模式在df['address']的每一行中都是不固定的。str.split(',',expand=True)如果您尝试提取那些并非结尾都是数字的值,该怎么办
.str.extract(r'(\w[^,]*)(?:,\s*\d+)?$,expand=False)
state = ['New York','New Jersey','California','North Carolina']

df1 = df['address'].str.split(', ', expand=True)
df['State'] = df1.where(df1.isin(state)).ffill(1).iloc[:, -1]
print (df)
                                             address           State
0      58 Chatham Street, Chatham, New Jersey, 07928      New Jersey
1  3420 W. MacArthur Blvd. Ste. C, Santa Ana, Cal...      California
2  2016 Chalk Rd, Wake Forest, North Carolina, 27587  North Carolina
state = ['New York','New Jersey','California','North Carolina']
def search_states(df):
    for i in state:
        if i in df['address']:
            df['states'] = i
            break
        else:
            continue
    return df
df = df.apply(search_states, axis = 1)