Python 获取位置字符串的最后一个单词，但特殊情况除外，如；“纽约”&引用；“北达科他州”&引用；南卡罗来纳州“；，等_Python_Pandas_String_Location

Python 获取位置字符串的最后一个单词，但特殊情况除外，如；“纽约”&引用；“北达科他州”&引用；南卡罗来纳州“；，等

python pandas string

Python 获取位置字符串的最后一个单词，但特殊情况除外，如；“纽约”&引用；“北达科他州”&引用；南卡罗来纳州“；，等,python,pandas,string,location,Python,Pandas,String,Location,我正在尝试从数据帧创建一个新字段。该字段为“位置”，包含城市和州信息。我使用了一个str.split（）.str[-1]函数来获取位置的最后一个字，通常是完整的州名问题是像“北卡罗莱纳州”这样的州变成了“卡罗莱纳州”。我想考虑一些特殊情况，比如.str[-2]=“北”或“新”或“南”或“西” 下面是我的代码示例： df["state"] = df.location.str.split().str[-1] print(df.state.value_counts().reset

我正在尝试从数据帧创建一个新字段。该字段为“位置”，包含城市和州信息。我使用了一个

str.split（）.str[-1]

函数来获取位置的最后一个字，通常是完整的州名

问题是像“北卡罗莱纳州”这样的州变成了“卡罗莱纳州”。我想考虑一些特殊情况，比如

.str[-2]

=“北”或“新”或“南”或“西”

下面是我的代码示例：

df["state"] = df.location.str.split().str[-1]
print(df.state.value_counts().reset_index())

以下是输出：

index  state  
0      california  59855  
1            york     17  
2        illinois      8  
3   massachusetts      5

你可以看到“约克”应该是“纽约”

我想我应该为location字段编写一个函数，如下所示：

def get_location(x):  
   if x.str.split().str[-2] in ["new", "north", "south", "west"]:  
      return x.str.split().str[-2:]  
   else:  
      return x.str.split().str[-1]

这里的问题是，我在调用

get\u location（df.location）

时收到以下错误消息：

序列的真值不明确。请使用a.empty、a.bool（）、a.item（）、a.any（）或a.all（）

我走对了吗？如何使新的df.state字段返回如下输出：

index   state  
0       california   59855  
1         new york      17  
2         illinois       8  
3    massachusetts       5  
4   north corolina       3

# Dataframe dummy from your Data:
your_df = pd.DataFrame({'location': ['New York', 'North Carolina', 'South Illinois', 'Texas', 'Florida'], 'another_field': [1000, 2000, 3000, 4000, 5000]})

# You verify the count of strings, if there are two or more, then you return full string.
your_df['state'] = your_df['location'].apply(lambda your_location: your_location if len(your_location.split(" ")) > 1 else your_location.split(" ")[-1])
your_df

谢谢大家!

您可以使用split方法计算字符串的长度，如下所示：

index   state  
0       california   59855  
1         new york      17  
2         illinois       8  
3    massachusetts       5  
4   north corolina       3

# Dataframe dummy from your Data:
your_df = pd.DataFrame({'location': ['New York', 'North Carolina', 'South Illinois', 'Texas', 'Florida'], 'another_field': [1000, 2000, 3000, 4000, 5000]})

# You verify the count of strings, if there are two or more, then you return full string.
your_df['state'] = your_df['location'].apply(lambda your_location: your_location if len(your_location.split(" ")) > 1 else your_location.split(" ")[-1])
your_df

输出：

    location       another_field    state
0   New York                1000    New York
1   North Carolina          2000    North Carolina
2   South Illinois          3000    South Illinois
3   Texas                   4000    Texas
4   Florida                 5000    Florida

您可以使用split方法计算字符串的长度，如下所示：

index   state  
0       california   59855  
1         new york      17  
2         illinois       8  
3    massachusetts       5  
4   north corolina       3

# Dataframe dummy from your Data:
your_df = pd.DataFrame({'location': ['New York', 'North Carolina', 'South Illinois', 'Texas', 'Florida'], 'another_field': [1000, 2000, 3000, 4000, 5000]})

# You verify the count of strings, if there are two or more, then you return full string.
your_df['state'] = your_df['location'].apply(lambda your_location: your_location if len(your_location.split(" ")) > 1 else your_location.split(" ")[-1])
your_df

输出：

    location       another_field    state
0   New York                1000    New York
1   North Carolina          2000    North Carolina
2   South Illinois          3000    South Illinois
3   Texas                   4000    Texas
4   Florida                 5000    Florida

原始位置是什么，即

df['location']

中的值？我觉得，在这种情况下，您应该有一个所有状态的列表，并检查一个状态是否是位置的子字符串，然后用状态替换它。通过这种方式，它比以编程方式执行要容易出错得多。@CodeDifferent你的问题让我想到了！解决方案是将新的“state”字段设置为以下值：df[“state”]=df.location.str.split（“，”）.str[-1]print（df.state.unique（）），因为“state”信息是分割的最后一部分，位置信息由逗号分隔，我可以通过指定拆分函数应使用“，”分隔符来获得这两个单词。原始位置是什么，即

df['location']

中的值？我觉得，在这种情况下，您应该有一个所有状态的列表，并检查一个状态是否是位置的子字符串，然后将其替换为状态。通过这种方式，它比以编程方式执行要容易出错得多。@CodeDifferent你的问题让我想到了！解决方案是将新的“state”字段设置为以下值：df[“state”]=df.location.str.split（“，”）.str[-1]print（df.state.unique（）），因为“state”信息是分割的最后一部分，位置信息由逗号分隔，我可以通过指定拆分函数应使用“，”分隔符来获得这两个单词。谢谢！这让我思考。我最终使用了下面的函数，它为我提供了所需的输出。事情是在逗号上分开的！这样，整个state字段将从其余位置信息中分离出来：df[“state”]=df.location.str.split（“，”）.str[-1]print（df.state.unique（））不客气，编码愉快：）谢谢！这让我思考。我最终使用了下面的函数，它为我提供了所需的输出。事情是在逗号上分开的！这样，整个状态字段将从其余位置信息中分离出来：df[“state”]=df.location.str.split（“，”）.str[-1]print（df.state.unique（））不客气，快乐编码：）