Python 熊猫：在列之间交换行_Python_Pandas_Dataframe

Python 熊猫：在列之间交换行

python pandas dataframe

Python 熊猫：在列之间交换行,python,pandas,dataframe,Python,Pandas,Dataframe,有些行被输入到错误的列中，所以现在我需要交换它们 df=pd.DataFrame（{'c'：{0:22:58:00'，1:23:03:00'，2:0'，3:10'，a'：{0:0'，1:10'，2:22:58:00'，3:23:00'，d'：{0:23:27:00'，1:23:39:00'，2:10'，3:17'，b'：{0:10'，1:17'，2:23:27:00'，3:39:00}）我目前的做法 cpy = df[['a', 'b']] df.loc[2:, 'a'] = df['c']

有些行被输入到错误的列中，所以现在我需要交换它们

df=pd.DataFrame（{'c'：{0:22:58:00'，1:23:03:00'，2:0'，3:10'，a'：{0:0'，1:10'，2:22:58:00'，3:23:00'，d'：{0:23:27:00'，1:23:39:00'，2:10'，3:17'，b'：{0:10'，1:17'，2:23:27:00'，3:39:00}）

我目前的做法

cpy = df[['a', 'b']]
df.loc[2:, 'a'] = df['c']
df.loc[2:, 'b'] = df['d']
df.loc[2:, 'c'] = cpy['a']
df.loc[2:, 'd'] = cpy['b']

预期产量

    a   b         c         d
0   0  10  22:58:00  23:27:00
1  10  17  23:03:00  23:39:00
2   0  10  22:58:00  23:27:00
3  10  17  23:03:00  23:39:00

它可以工作，但这是可能的，因为它是4列。有更好的方法吗

注意：数据类型可能会导致排序问题

df.loc[0]['c']

是

datetime.time（22,58）

也许是这样的

df.swap\u row\u col（索引=[2:]，列从=['a'，'b']，列到=['c'，'d']）

方法1:

np.sort

np.sort

使用

pd.DataFrame

构造函数为我工作：

df = pd.DataFrame(np.sort(df.astype(str)), columns=df.columns)

    a   b         c         d
0   0  10  22:58:00  23:27:00
1  10  17  23:03:00  23:39:00
2   0  10  22:58:00  23:27:00
3  10  17  23:03:00  23:39:00

方法2：更一般的做法是，检查哪些行与日期模式匹配，反之亦然，然后用

bfill

或

ffill

交换这些值：

match_pattern = df.apply(lambda x: x.str.match('\d{2}:\d{2}:\d{2}'))

numeric = df.where(~match_pattern).bfill(axis=1).dropna(how='any', axis=1)
dates = df.where(match_pattern).ffill(axis=1).dropna(how='any', axis=1)

df = pd.concat([numeric, dates], axis=1)

    a   b         c         d
0   0  10  22:58:00  23:27:00
1  10  17  23:03:00  23:39:00
2   0   0  23:27:00  23:27:00
3  10  10  23:39:00  23:39:00

也许我们可以在我的解决方案中尝试注意，如果原始订单是100，0，我的输出仍然是100，0

df=pd.DataFrame(df.apply(lambda x : sorted(x,key= lambda s: ':' in s),1).tolist(),columns=df.columns)
Out[119]: 
     c   a         d         b
0    0  10  22:58:00  23:27:00
1   10  17  23:03:00  23:39:00
2  100  10  22:58:00  23:27:00
3   10  17  23:03:00  23:39:00

为了在示例中交换和分离

datetime.time

和

string

，您可以使用

applymap

、

np.argsort

和numpy索引（注意：示例中的数字是字符串格式的，所以我选择type

str

）

如果您得到

AttributeError:“DataFrame”对象没有属性“to\u numpy”

将

替换为

为值
排序问题类型错误：无序类型：int（）
Ahh混合类型列。这很难解决。除非你把它转换成字符串。查看我的答案编辑。添加了新的方法，你能检查一下吗@ksooklallTypeError:输入类型不支持ufunc'invert'，并且根据强制转换规则“safe”
，无法将输入安全强制为任何支持的类型，match_pattern是allNaN
，您必须首先使用df.astype（str）
，否则比较日期
和int
就很难了，你不同意吗？你的解决方案是datetime对象有问题TypeError:（“datetime.time”类型的参数不可编辑，“'Occessed at index 0”）
@ksooklall我用你的代码创建数据框，同样的问题-u-@YOandBEN_W@Erfan嗯，这个样本似乎不能解决真正的问题+1，比我的更安全、更一般。希望有一个pandas
排序方法，我们可以在sorted
函数中指定key
。啊，是的，我必须用替换到numpy
。值
哦，我想你的pandas版本<0.24<代码>to_numpy在pandas 0.24+中实现：）
df=pd.DataFrame(df.apply(lambda x : sorted(x,key= lambda s: ':' in s),1).tolist(),columns=df.columns)
Out[119]: 
     c   a         d         b
0    0  10  22:58:00  23:27:00
1   10  17  23:03:00  23:39:00
2  100  10  22:58:00  23:27:00
3   10  17  23:03:00  23:39:00

arr = np.argsort(df.applymap(type).ne(str), 1).to_numpy()

Out[985]:
array([[0, 1, 2, 3],
       [0, 1, 2, 3],
       [2, 3, 0, 1],
       [2, 3, 0, 1]], dtype=int32)

df_out = pd.DataFrame(df.to_numpy()[df.index[:,None], arr], columns=df.columns)

Out[989]:
    a   b         c         d
0   0  10  22:58:00  23:27:00
1  10  17  23:03:00  23:39:00
2   0  10  22:58:00  23:27:00
3  10  17  23:03:00  23:39:00