Python 将序列传递给Series.map()时的NaN值
我可能是走错了路。。 我正在寻找大约100家英国医院的邮政编码。我有一份Excel电子表格(all_all),上面有英国医院/诊所/etc总数(14000家)及其地址和邮政编码 我有一个贯穿这100家医院的手术活动数据框(脊椎),医院名称在2817行中重复Python 将序列传递给Series.map()时的NaN值,python,pandas,Python,Pandas,我可能是走错了路。。 我正在寻找大约100家英国医院的邮政编码。我有一份Excel电子表格(all_all),上面有英国医院/诊所/etc总数(14000家)及其地址和邮政编码 我有一个贯穿这100家医院的手术活动数据框(脊椎),医院名称在2817行中重复 spine.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 2818 entries, 0 to 2817 Data columns (total 7 colu
spine.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2818 entries, 0 to 2817
Data columns (total 7 columns):
index_col 2818 non-null float64
fyear 2818 non-null int64
NNAPID 2818 non-null int64
mainspef 2818 non-null int64
Trust 2818 non-null object
complexcount 2818 non-null float64
simplecount 2818 non-null float64
dtypes: float64(3), int64(3), object(1)
memory usage: 154.2+ KB
它找不到任何匹配项。医院字段是例如利兹教学医院NHS信托基金,邮政编码中有相同的条目
有没有办法探究它失败的原因?。我是一名医生,试图学习python和pandas来进行数据分析,所以有很多早期步骤
我不确定它是否没有失败,我只是在某个地方定义了错误的数据类型,或者我正在尝试匹配两个本质上不同的列,并且希望能够检查失败的代码
很抱歉,在我匆忙赶往诊所的时候,这篇文章又模糊又简短
更新
根据乔在下面的评论,我简化了事情。在spine csv中,我将列定义为“信任”,在邮政编码csv中,我将索引列定义为“信任”
我现在肯定在比较spine中的医院名称和邮政编码中的索引字段。我现在在“信任”中得到一个关键错误
我的代码在这里
import pandas as pd
spine = pd.read_csv('~/Dropbox/Work/NNAP/Spine/Kate_W/kate_spine2.csv', usecols = ['Trust'])
spine.head()
Trust
0 THE WALTON CENTRE NHS FOUNDATION TRUST
1 CAMBRIDGE UNIVERSITY HOSPITALS NHS FOUNDATION ...
2 KING'S COLLEGE HOSPITAL NHS FOUNDATION TRUST
3 LEEDS TEACHING HOSPITALS NHS TRUST
4 NT424
postcodes_all = pd.read_csv('all_all.csv', index_col = 'Trust')
postcodes_all.head()
Unnamed: 0 postcode
Trust
MANCHESTER UNIVERSITY NHS FOUNDATION TRUST 0 M13 9WL
SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST 1 SR4 7TP
WORCESTERSHIRE HEALTH AND CARE NHS TRUST 2 WR5 1JR
SOLENT NHS TRUST 3 SO19 8BR
SHROPSHIRE COMMUNITY HEALTH NHS TRUST 4 SY3 8XL
为了确保我使用的是序列而不是数据帧,我在代码中添加了“信任”,如下所示
map1 = spine['Trust'].map(postcodes_all['Trust'])
KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Trust'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-68-921448f7c401> in <module>
----> 1 map1 = spine['Trust'].map(postcodes_all['Trust'])
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2993 if self.columns.nlevels > 1:
2994 return self._getitem_multilevel(key)
-> 2995 indexer = self.columns.get_loc(key)
2996 if is_integer(indexer):
2997 indexer = [indexer]
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Trust'
map1=脊椎['Trust'].map(邮政编码_all['Trust'])
KeyError回溯(最近一次呼叫最后一次)
get_loc中的~/anaconda3/lib/python3.7/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
2896尝试:
->2897自动返回发动机。获取位置(钥匙)
2898除按键错误外:
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
KeyError:“信任”
在处理上述异常期间,发生了另一个异常:
KeyError回溯(最近一次呼叫最后一次)
在里面
---->1 map1=脊椎['Trust'].map(邮政编码_all['Trust'])
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in\uuuuu getitem\uuuuu(self,key)
2993如果self.columns.nlevels>1:
2994返回自我。\u获取项目\u多级(键)
->2995 indexer=self.columns.get_loc(键)
2996如果是_整数(索引器):
2997索引器=[索引器]
get_loc中的~/anaconda3/lib/python3.7/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
2897自动返回发动机。获取位置(钥匙)
2898除按键错误外:
->2899返回self.\u引擎。获取self.\u loc(self.\u可能\u cast\u索引器(键))
2900 indexer=self.get_indexer([key],method=method,tolerance=tolerance)
2901如果indexer.ndim>1或indexer.size>1:
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
KeyError:“信任”
我不确定这为什么不正确,以及键错误的含义。之所以得到所有NaN值,是因为
spine['Trust']中没有一个值。
在postcodes\u all['Trust\u title']
的索引中可以找到。
map()
用于将旧值替换为新值。
它需要一个键值对来知道要使用哪个新值
替换每个旧值时。
连续剧,,
它使用索引作为键,使用单个系列列作为值
关于如何在这种情况下进行调试的提示,
试着用一个简单的例子,
e、 g.您链接的pandas文档中的一个。
请参见下面的示例
将熊猫作为pd导入
s=pd.系列([‘猫’、‘狗’、‘兔子’]))
s
s2=pd.系列([‘食肉动物’、‘杂食动物’、‘食草动物’]))
s2
s.map(s2)
NaN
返回,
因为熊猫在s
s中的值之间找不到任何匹配值
以及s2
中的索引。
将s2
的索引设置为s
的值可以解决此问题
#将's'中的值设置为's2'中的索引`
s2.index=s
s2
s.map(s2)
有没有可能您正试图将两个表连接在一起?像Excel中的vlookup还是SQL中的join?在这种情况下,我会使用类似于
spine.merge(postcodes\u all,left\u on=“Trust”,right\u on=“Trust\u title”,how=“left”)
。另外,我要确保邮政编码中没有重复的行,因为这可能会弄乱你想要的结果。这非常好,非常有效,-是的,vlookup是我想要的。我仍然想知道为什么我的映射失败了,-见上文,-或者更确切地说是如何调试它。我本来希望得到一个邮政编码列表,然后将其添加到spine文件中,但您的解决方案一次就完成了。之所以得到所有NaN
值,是因为spine['Trust']
中的值都没有在Postcode\u all['Trust\u title']
的索引中找到<代码>映射()用于将旧值替换为新值。它需要一个键值对来知道在替换每个旧值时要使用哪个新值,对于一个使用索引作为键、使用单个列作为值的序列。我的调试技巧是尝试使用一个更简单的示例,例如您链接的pandas文档中的示例。我可以用一个例子来展开,如果
spine['Trust'].map(postcodes_all['Trust_title'])
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
...
2813 NaN
2814 NaN
2815 NaN
2816 NaN
2817 NaN
Name: Trust, Length: 2818, dtype: object
import pandas as pd
spine = pd.read_csv('~/Dropbox/Work/NNAP/Spine/Kate_W/kate_spine2.csv', usecols = ['Trust'])
spine.head()
Trust
0 THE WALTON CENTRE NHS FOUNDATION TRUST
1 CAMBRIDGE UNIVERSITY HOSPITALS NHS FOUNDATION ...
2 KING'S COLLEGE HOSPITAL NHS FOUNDATION TRUST
3 LEEDS TEACHING HOSPITALS NHS TRUST
4 NT424
postcodes_all = pd.read_csv('all_all.csv', index_col = 'Trust')
postcodes_all.head()
Unnamed: 0 postcode
Trust
MANCHESTER UNIVERSITY NHS FOUNDATION TRUST 0 M13 9WL
SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST 1 SR4 7TP
WORCESTERSHIRE HEALTH AND CARE NHS TRUST 2 WR5 1JR
SOLENT NHS TRUST 3 SO19 8BR
SHROPSHIRE COMMUNITY HEALTH NHS TRUST 4 SY3 8XL
map1 = spine['Trust'].map(postcodes_all['Trust'])
KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Trust'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-68-921448f7c401> in <module>
----> 1 map1 = spine['Trust'].map(postcodes_all['Trust'])
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2993 if self.columns.nlevels > 1:
2994 return self._getitem_multilevel(key)
-> 2995 indexer = self.columns.get_loc(key)
2996 if is_integer(indexer):
2997 indexer = [indexer]
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Trust'
## Output
0 cat
1 dog
2 rabbit
dtype: object
## Output
0 carnivore
1 omnivore
2 herbivore
dtype: object
## Output
0 NaN
1 NaN
2 NaN
dtype: object
## Output
cat carnivore
dog omnivore
rabbit herbivore
dtype: object
## Output
0 carnivore
1 omnivore
2 herbivore
dtype: object