Python 将序列传递给Series.map()时的NaN值

Python 将序列传递给Series.map()时的NaN值,python,pandas,Python,Pandas,我可能是走错了路。。 我正在寻找大约100家英国医院的邮政编码。我有一份Excel电子表格(all_all),上面有英国医院/诊所/etc总数(14000家)及其地址和邮政编码 我有一个贯穿这100家医院的手术活动数据框(脊椎),医院名称在2817行中重复 spine.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 2818 entries, 0 to 2817 Data columns (total 7 colu

我可能是走错了路。。 我正在寻找大约100家英国医院的邮政编码。我有一份Excel电子表格(all_all),上面有英国医院/诊所/etc总数(14000家)及其地址和邮政编码

我有一个贯穿这100家医院的手术活动数据框(脊椎),医院名称在2817行中重复


spine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2818 entries, 0 to 2817
Data columns (total 7 columns):
index_col       2818 non-null float64
fyear           2818 non-null int64
NNAPID          2818 non-null int64
mainspef        2818 non-null int64
Trust           2818 non-null object
complexcount    2818 non-null float64
simplecount     2818 non-null float64
dtypes: float64(3), int64(3), object(1)
memory usage: 154.2+ KB

它找不到任何匹配项。医院字段是例如利兹教学医院NHS信托基金,邮政编码中有相同的条目

有没有办法探究它失败的原因?。我是一名医生,试图学习python和pandas来进行数据分析,所以有很多早期步骤

我不确定它是否没有失败,我只是在某个地方定义了错误的数据类型,或者我正在尝试匹配两个本质上不同的列,并且希望能够检查失败的代码

很抱歉,在我匆忙赶往诊所的时候,这篇文章又模糊又简短

更新

根据乔在下面的评论,我简化了事情。在spine csv中,我将列定义为“信任”,在邮政编码csv中,我将索引列定义为“信任”

我现在肯定在比较spine中的医院名称和邮政编码中的索引字段。我现在在“信任”中得到一个关键错误

我的代码在这里

import pandas as pd

spine = pd.read_csv('~/Dropbox/Work/NNAP/Spine/Kate_W/kate_spine2.csv', usecols = ['Trust'])



spine.head()

Trust
0   THE WALTON CENTRE NHS FOUNDATION TRUST
1   CAMBRIDGE UNIVERSITY HOSPITALS NHS FOUNDATION ...
2   KING'S COLLEGE HOSPITAL NHS FOUNDATION TRUST
3   LEEDS TEACHING HOSPITALS NHS TRUST
4   NT424

postcodes_all = pd.read_csv('all_all.csv', index_col = 'Trust')


postcodes_all.head()

    Unnamed: 0  postcode
Trust       
MANCHESTER UNIVERSITY NHS FOUNDATION TRUST  0   M13 9WL
SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST  1   SR4 7TP
WORCESTERSHIRE HEALTH AND CARE NHS TRUST    2   WR5 1JR
SOLENT NHS TRUST    3   SO19 8BR
SHROPSHIRE COMMUNITY HEALTH NHS TRUST   4   SY3 8XL

为了确保我使用的是序列而不是数据帧,我在代码中添加了“信任”,如下所示


map1 = spine['Trust'].map(postcodes_all['Trust'])

KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Trust'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-68-921448f7c401> in <module>
----> 1 map1 = spine['Trust'].map(postcodes_all['Trust'])

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2993             if self.columns.nlevels > 1:
   2994                 return self._getitem_multilevel(key)
-> 2995             indexer = self.columns.get_loc(key)
   2996             if is_integer(indexer):
   2997                 indexer = [indexer]

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Trust'


map1=脊椎['Trust'].map(邮政编码_all['Trust'])
KeyError回溯(最近一次呼叫最后一次)
get_loc中的~/anaconda3/lib/python3.7/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
2896尝试:
->2897自动返回发动机。获取位置(钥匙)
2898除按键错误外:
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
KeyError:“信任”
在处理上述异常期间,发生了另一个异常:
KeyError回溯(最近一次呼叫最后一次)
在里面
---->1 map1=脊椎['Trust'].map(邮政编码_all['Trust'])
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in\uuuuu getitem\uuuuu(self,key)
2993如果self.columns.nlevels>1:
2994返回自我。\u获取项目\u多级(键)
->2995 indexer=self.columns.get_loc(键)
2996如果是_整数(索引器):
2997索引器=[索引器]
get_loc中的~/anaconda3/lib/python3.7/site-packages/pandas/core/index/base.py(self、key、method、tolerance)
2897自动返回发动机。获取位置(钥匙)
2898除按键错误外:
->2899返回self.\u引擎。获取self.\u loc(self.\u可能\u cast\u索引器(键))
2900 indexer=self.get_indexer([key],method=method,tolerance=tolerance)
2901如果indexer.ndim>1或indexer.size>1:
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx在pandas中。_libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
pandas/_libs/hashtable_class_helper.pxi在pandas._libs.hashtable.PyObjectHashTable.get_item()中
KeyError:“信任”

我不确定这为什么不正确,以及键错误的含义。

之所以得到所有NaN值,是因为
spine['Trust']中没有一个值。
postcodes\u all['Trust\u title']
的索引中可以找到。
map()
用于将旧值替换为新值。 它需要一个键值对来知道要使用哪个新值 替换每个旧值时。 连续剧,, 它使用索引作为键,使用单个系列列作为值

关于如何在这种情况下进行调试的提示, 试着用一个简单的例子, e、 g.您链接的pandas文档中的一个。 请参见下面的示例

将熊猫作为pd导入
s=pd.系列([‘猫’、‘狗’、‘兔子’]))
s

s2=pd.系列([‘食肉动物’、‘杂食动物’、‘食草动物’]))
s2

s.map(s2)
NaN
返回, 因为熊猫在
s
s中的值之间找不到任何匹配值 以及
s2
中的索引。 将
s2
的索引设置为
s
的值可以解决此问题


#将's'中的值设置为's2'中的索引`
s2.index=s
s2

s.map(s2)

有没有可能您正试图将两个表连接在一起?像Excel中的vlookup还是SQL中的join?在这种情况下,我会使用类似于
spine.merge(postcodes\u all,left\u on=“Trust”,right\u on=“Trust\u title”,how=“left”)
。另外,我要确保邮政编码中没有重复的行,因为这可能会弄乱你想要的结果。这非常好,非常有效,-是的,vlookup是我想要的。我仍然想知道为什么我的映射失败了,-见上文,-或者更确切地说是如何调试它。我本来希望得到一个邮政编码列表,然后将其添加到spine文件中,但您的解决方案一次就完成了。之所以得到所有
NaN
值,是因为
spine['Trust']
中的值都没有在
Postcode\u all['Trust\u title']
的索引中找到<代码>映射()用于将旧值替换为新值。它需要一个键值对来知道在替换每个旧值时要使用哪个新值,对于一个使用索引作为键、使用单个列作为值的序列。我的调试技巧是尝试使用一个更简单的示例,例如您链接的pandas文档中的示例。我可以用一个例子来展开,如果
     spine['Trust'].map(postcodes_all['Trust_title'])
        0       NaN
1       NaN
2       NaN
3       NaN
4       NaN
       ... 
2813    NaN
2814    NaN
2815    NaN
2816    NaN
2817    NaN
Name: Trust, Length: 2818, dtype: object
import pandas as pd

spine = pd.read_csv('~/Dropbox/Work/NNAP/Spine/Kate_W/kate_spine2.csv', usecols = ['Trust'])



spine.head()

Trust
0   THE WALTON CENTRE NHS FOUNDATION TRUST
1   CAMBRIDGE UNIVERSITY HOSPITALS NHS FOUNDATION ...
2   KING'S COLLEGE HOSPITAL NHS FOUNDATION TRUST
3   LEEDS TEACHING HOSPITALS NHS TRUST
4   NT424

postcodes_all = pd.read_csv('all_all.csv', index_col = 'Trust')


postcodes_all.head()

    Unnamed: 0  postcode
Trust       
MANCHESTER UNIVERSITY NHS FOUNDATION TRUST  0   M13 9WL
SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST  1   SR4 7TP
WORCESTERSHIRE HEALTH AND CARE NHS TRUST    2   WR5 1JR
SOLENT NHS TRUST    3   SO19 8BR
SHROPSHIRE COMMUNITY HEALTH NHS TRUST   4   SY3 8XL


map1 = spine['Trust'].map(postcodes_all['Trust'])

KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Trust'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-68-921448f7c401> in <module>
----> 1 map1 = spine['Trust'].map(postcodes_all['Trust'])

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2993             if self.columns.nlevels > 1:
   2994                 return self._getitem_multilevel(key)
-> 2995             indexer = self.columns.get_loc(key)
   2996             if is_integer(indexer):
   2997                 indexer = [indexer]

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Trust'

## Output
0       cat
1       dog
2    rabbit
dtype: object
## Output
0    carnivore
1     omnivore
2    herbivore
dtype: object
## Output
0    NaN
1    NaN
2    NaN
dtype: object
## Output
cat       carnivore
dog        omnivore
rabbit    herbivore
dtype: object
## Output
0    carnivore
1     omnivore
2    herbivore
dtype: object