更改python中的索引后使用df.loc时缺少值
将索引更改为“PassengerId”,然后尝试使用更改python中的索引后使用df.loc时缺少值,python,pandas,Python,Pandas,将索引更改为“PassengerId”,然后尝试使用df.loc函数根据新索引检索信息,但结果包含缺少的值 正在探索泰坦尼克号数据集 在新的_行中附加一些值 将索引更改为PassengerId 尝试使用df.loc进行搜索 获得的结果值在现有行中消失,但显示新附加行的值 得到以下结果: NaN for all the rows except for the new_row(892) FutureWarning: Passing list-likes to .loc or [] with a
df.loc
函数根据新索引检索信息,但结果包含缺少的值
正在探索泰坦尼克号数据集
NaN for all the rows except for the new_row(892)
FutureWarning: Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative`
See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
预期结果:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
PassengerId
890 890 1 1 Behr, Mr. Karl Howell male 26 0.0 0.0 111369 30.00 C148 C
891 891 0 3 Dooley, Mr. Patrick male 32 0.0 0.0 370376 7.75 NaN Q
892 892 0 1 NA NA NA NaN NaN NaN NaN NaN NaN
部分答覆:
正在运行测试
import pandas as pd
import numpy as np
dataset= pd.DataFrame(columns=["PassengerId","Survived","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked"],data=[[891,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],[892,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]])
print(dataset)
# Add rows
new_row=pd.Series(data=['890','0','1','NA','NA','NA'], index=['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age'])
dataset=dataset.append(new_row, ignore_index=True)
# Setting PassengerId as Index
dataset= dataset.set_index(dataset['PassengerId'])
dataset.loc[[892,891,890]]
print(dataset)
并得出以下结果:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare \
0 891 1 NaN NaN NaN NaN NaN NaN NaN NaN
1 892 2 NaN NaN NaN NaN NaN NaN NaN NaN
Cabin Embarked
0 NaN NaN
1 NaN NaN
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket \
PassengerId
891 891 1 NaN NaN NaN NaN NaN NaN NaN
892 892 2 NaN NaN NaN NaN NaN NaN NaN
890 890 0 1 NA NA NA NaN NaN NaN
Fare Cabin Embarked
PassengerId
891 NaN NaN NaN
892 NaN NaN NaN
890 NaN NaN NaN
似乎正是添加我提到的int类型列的值时所要查找的“PassengerId”、“Age”等改为“892”,而不仅仅是892。这将大多数列的类型从int更改为object。删除倒逗号解决了这个问题。但是891和892中的所有值都消失了。我希望得到一个在我的数据集中有现有值.虚拟数据的结果。我只在“幸存”列中插入了值。尽管如此,仍在试图复制您的问题…谢谢@Kryesec,我刚刚发现了我犯的错误。它解决了这个问题。在添加新行时,我提到了int类型列的值,如“PassengerId”、“Age”等,而不是仅892。这将大多数列的类型从int改为object。删除倒逗号解决了这个问题。\np!数据类型可能会把任何人的一天都搞砸。\uuuuuu u;。
import pandas as pd
import numpy as np
dataset= pd.DataFrame(columns=["PassengerId","Survived","Pclass","Name","Sex","Age","SibSp","Parch","Ticket","Fare","Cabin","Embarked"],data=[[891,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],[892,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]])
print(dataset)
# Add rows
new_row=pd.Series(data=['890','0','1','NA','NA','NA'], index=['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age'])
dataset=dataset.append(new_row, ignore_index=True)
# Setting PassengerId as Index
dataset= dataset.set_index(dataset['PassengerId'])
dataset.loc[[892,891,890]]
print(dataset)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare \
0 891 1 NaN NaN NaN NaN NaN NaN NaN NaN
1 892 2 NaN NaN NaN NaN NaN NaN NaN NaN
Cabin Embarked
0 NaN NaN
1 NaN NaN
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket \
PassengerId
891 891 1 NaN NaN NaN NaN NaN NaN NaN
892 892 2 NaN NaN NaN NaN NaN NaN NaN
890 890 0 1 NA NA NA NaN NaN NaN
Fare Cabin Embarked
PassengerId
891 NaN NaN NaN
892 NaN NaN NaN
890 NaN NaN NaN