Python 从字符串中删除最后四位数字–将Zip+4转换为邮政编码_Python_Pandas

Python 从字符串中删除最后四位数字–将Zip+4转换为邮政编码

python pandas

Python 从字符串中删除最后四位数字–将Zip+4转换为邮政编码,python,pandas,Python,Pandas,下面的一段代码 data = np.array([['','state','zip_code','collection_status'], ['42394','CA','92637-2854', 'NaN'], ['58955','IL','60654', 'NaN'], ['108365','MI','48021-1319', 'NaN'], ['109116','M

下面的一段代码

data = np.array([['','state','zip_code','collection_status'],
                ['42394','CA','92637-2854', 'NaN'],
                ['58955','IL','60654', 'NaN'],
                ['108365','MI','48021-1319', 'NaN'],
                ['109116','MI','48228', 'NaN'],
                ['110833','IL','60008-4227', 'NaN']])

print(pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))

。。。给出以下数据帧：

         state            zip_code    collection_status
42394       CA          92637-2854                  NaN
58955       IL               60654                  NaN
108365      MI          48021-1319                  NaN
109116      MI               48228                  NaN
110833      IL          60008-4227                  NaN

目标是将邮政编码列同质化为5位格式，即当特定数据点有9位而不是5位时，我希望从邮政编码中删除最后四位。顺便说一句，邮政编码的类型是对象类型

有什么想法吗？

仅供使用，谢谢：

如果需要添加条件，请使用或：

不知道df.zip_code.str[：5]是否足够工作，而不需要在何处进行检查？

df['collection_status'] = df['zip_code'].str[:5]
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008

df['collection_status'] = df['zip_code'].where(df['zip_code'].str.len() == 5, 
                                               df['zip_code'].str[:5])
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008

df['collection_status'] = np.where(df['zip_code'].str.len() == 5, 
                                   df['zip_code'],
                                   df['zip_code'].str[:5])
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008