Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/362.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何强制多索引级别为数据类型_Python_Pandas - Fatal编程技术网

Python 如何强制多索引级别为数据类型

Python 如何强制多索引级别为数据类型,python,pandas,Python,Pandas,我使用combine_first根据两个键组合两个数据帧,目标是将df1中未包含的df2索引附加到结果中,并用df2中的值覆盖这两个数据帧中包含的索引 示例df1: df1 = pd.DataFrame({ "key1": ["A", "A", "A", "B", "B", "C", "C"], "id": ["a1", "a2", "a3", 1, 2, "c1", "c2"], "data1": [pd.np.random.randint(5) for i in ra

我使用
combine_first
根据两个键组合两个数据帧,目标是将df1中未包含的df2索引附加到结果中,并用df2中的值覆盖这两个数据帧中包含的索引

示例df1:

df1 = pd.DataFrame({
    "key1": ["A", "A", "A", "B", "B", "C", "C"],
    "id": ["a1", "a2", "a3", 1, 2, "c1", "c2"],
    "data1": [pd.np.random.randint(5) for i in range(7)],
    "data2": [pd.np.random.randint(1000) for i in range(7)]
})
示例df2:

df2 = pd.DataFrame({
    "key1": ["B", "B", "B"],
    "id": [2, 3, 4],
    "data1": [pd.np.random.randint(5) for i in range(3)],
    "data2": [pd.np.random.randint(1000) for i in range(3)]
})
df1.设置索引([“key1”,“id”])。首先组合索引(df2.设置索引([“key1”,“id”])
给出所需的结果:

         data1  data2
key1 id              
A    a1    0.0  588.0
     a2    2.0  709.0
     a3    3.0  877.0
B    1     3.0  468.0
     2     0.0  612.0
     3     2.0  139.0
     4     3.0  154.0
C    c1    4.0  855.0
     c2    4.0  564.0
但是,在将结果存储为csv、再次加载并运行相同的命令后,我得到以下错误:

TypeError: '<' not supported between instances of 'str' and 'int'
看起来不错:

         data1  data2
key1 id              
A    a1    0.0  588.0
     a2    2.0  709.0
     a3    3.0  877.0
B    1     3.0  468.0
     2     0.0  612.0
     3     2.0  139.0
     4     3.0  154.0
C    c1    4.0  855.0
     c2    4.0  564.0
但是
df\u已加载。首先合并(df2.set\u索引([“key1”,“id”))
会导致:

         data1  data2
key1 id              
A    a1    0.0  588.0
     a2    2.0  709.0
     a3    3.0  877.0
B    1     3.0  468.0
     2     0.0  612.0
     3     2.0  139.0
     4     3.0  154.0
C    c1    4.0  855.0
     c2    4.0  564.0
B    2     2.0  317.0
     3     2.0  139.0
     4     3.0  154.0

在组合之前,应将
id
列转换为
str
,而不是
对象

这将有助于:

df2.id = df2.id.astype(str)
df_loaded.combine_first(df2.set_index(["key1", "id"]))

作为一种解决方法,我首先对某个
key1
组合所有新旧观察结果,然后附加它们以获得最终结果。不过,我不希望包含额外的步骤,只需将
df2
的第(1)级数据类型设置为object。您确定在设置索引和组合之前将
df2.id
dtype
更改为
object
不能解决问题吗?我刚刚试过,它成功了。我试过的:
df2.id=df2.id.astype(str);加载df_。首先合并(df2.设置索引([“键1”,“id”))是否先将df保存为csv,然后重新加载?我已经完成了
df2.id=df2.id.astype(“object”)
df\u加载。首先组合(df2.set\u index([“key1”,“id”]))
,但我只是尝试了你的代码(在一行上用分号),得到了同样的乱码索引结果。对不起,我没有看到你使用了
str
而不是
object
。刚刚试过,确实有效。你知道为什么它只适用于
str
df2.id = df2.id.astype(str)
df_loaded.combine_first(df2.set_index(["key1", "id"]))