String 合并两列，同时消除数据帧中的重复字符串_String_Pandas_Dataframe_Lambda_List Comprehension

String 合并两列，同时消除数据帧中的重复字符串

string pandas dataframe lambda

String 合并两列，同时消除数据帧中的重复字符串,string,pandas,dataframe,lambda,list-comprehension,String,Pandas,Dataframe,Lambda,List Comprehension,我有一个带有原始列“All”的数据帧，我将其拆分为RegionName1和RegioName2列。有重复条目，例如，德卢斯和德卢斯（明尼苏达大学德卢斯分校）。我想将像Duluth（明尼苏达大学Duluth）这样的字符串转换为NaN值。所以我试过了 unitown['RegionName2']=[np.nan如果'（'在x中，否则x代表unitown['RegionName2']] 我犯了一个错误 TypeError:类型为“float”的参数不可编辑。我还可以尝试什么您可以使用： uni

我有一个带有原始列“All”的数据帧，我将其拆分为RegionName1和RegioName2列。有重复条目，例如，德卢斯和德卢斯（明尼苏达大学德卢斯分校）。我想将像Duluth（明尼苏达大学Duluth）这样的字符串转换为NaN值。所以我试过了

unitown['RegionName2']=[np.nan如果'（'在x中，否则x代表unitown['RegionName2']]

我犯了一个错误 TypeError:类型为“float”的参数不可编辑。我还可以尝试什么

您可以使用：

unitown.loc[unitown.RegionName2.str.contains（“”，'RegionName2']=np.NaN

或者将此逻辑直接添加到生成

RegionName2

的代码中，如所示：

unitown['RegionName2']=unitown['All']。应用(
lambda x:x.split（'，'）[0]。如果x.count（'，'））和“（”不在x.split（'，'）[0]中，则为strip（），否则为np.NaN
)

谢谢，foglerit！这正是我要找的。我的荣幸@MariaBruevich。你可以点击“接受”按钮让其他人很容易知道这个答案解决了你的问题吗？谢谢我没有看到“接受”按钮？我在你的答案旁边点击了“这个答案很有用”。顺便说一句，我发现我应该将NaN转换为“字符串”让我的列表理解工作。

unitown=pd.read_table('university_towns.txt', header=None).rename(columns={0:'All'})
unitown['State']=unitown['All'].apply(lambda x: x.split('[edi')[0].strip() if x.count('[edi') else np.NaN).fillna(method="ffill")                       #.fillna(method="ffill")
unitown['RegionName1'] = unitown['All'].apply(lambda x: x.split('(')[0].strip() if x.count('(') else np.NaN)
unitown['RegionName2'] = unitown['All'].apply(lambda x: x.split(',')[0].strip() if x.count(',') else np.NaN)
unitown['RegionName2'] = [np.nan if '(' in x else x for x in     unitown['RegionName2']]
return unitown[unitown.State=='Minnesota']

#input data
d = {'RegionName1': ["a", "b", "c", "d"], 'RegionName2': ['Duluth and Duluth (University of Minnesota Duluth', "Monkato(Minnesota", 'Other1', 'Other2']}
df = pd.DataFrame(data=d)
print("Input dataframe:")
print(df)

#searching for '(' in RegionName2 column and replacing with NaN
z=0
for i, row in df.iterrows():
  k = df.loc[z,'RegionName2']
  if '(' in str(k):
    df.loc[z,'RegionName2'] = np.nan
  z = z+1
print("Output dataframe:")
print(df)