Python dataframe,如果记录包含特定值,则将其移动到另一列

Python dataframe,如果记录包含特定值,则将其移动到另一列,python,dataframe,Python,Dataframe,我有以下数据: 例如,在第2行中,我想将所有的“3:xxx”移到第3列,将所有的“4:xxx”移到第4列。我该怎么做 顺便说一句,我试过这个,但它不起作用: df[3] = np.where((df[2].str.contains('3:'))) 数据集加载: url = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale' df = pd.read_csv(url,header=None

我有以下数据:

例如,在第2行中,我想将所有的“3:xxx”移到第3列,将所有的“4:xxx”移到第4列。我该怎么做

顺便说一句,我试过这个,但它不起作用:

df[3] = np.where((df[2].str.contains('3:')))
数据集加载:

url = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale'
df = pd.read_csv(url,header=None,delim_whitespace=True)

我认为最简单的方法是在将数据集读入数据帧之前清理数据集。查看数据源时,似乎有一些行缺少字段,即:

# (Missing the 3's field)
'1 1:-0.611111 2:0.166667508 4:-0.916667'
所以我会在读取之前清理文件。对于此行,您可以在2:0.166667508和4:0.916667之间添加一个额外的空格,以表示空的第三列:

'1 1:-0.611111 2:0.166667508 4:-0.916667 '.split(' ')
# ['1', '1:-0.611111', '2:0.166667508', '4:-0.916667', '']
'1 1:-0.611111 2:0.166667508  4:-0.916667 '.split(' ')
# ['1', '1:-0.611111', '2:0.166667508', '', '4:-0.916667', '']

我认为最简单的方法是在将数据集读入数据帧之前清理数据集。查看数据源时,似乎有一些行缺少字段,即:

# (Missing the 3's field)
'1 1:-0.611111 2:0.166667508 4:-0.916667'
所以我会在读取之前清理文件。对于此行,您可以在2:0.166667508和4:0.916667之间添加一个额外的空格,以表示空的第三列:

'1 1:-0.611111 2:0.166667508 4:-0.916667 '.split(' ')
# ['1', '1:-0.611111', '2:0.166667508', '4:-0.916667', '']
'1 1:-0.611111 2:0.166667508  4:-0.916667 '.split(' ')
# ['1', '1:-0.611111', '2:0.166667508', '', '4:-0.916667', '']

我同意Greg的建议,即在将数据集读入数据帧之前先清理数据集,但如果您希望在不匹配的列上使用shift值,那么您可以尝试下面的方法

input.csv

1,1:-0.55,2:0.25,3:-0.86,4:-91
1,1:-0.57,2:0.26,3:-0.87,4:-0.92
1,1:-0.57,3:-0.89,4:-0.93,NaN
1,1:-0.58,2:0.25,3:-0.88,4:-0.99
特定索引代码处的移位

import pandas as pd
df = pd.read_csv('files/60009536-input.csv')
print(df)

for col_num in df.columns:
    if col_num > '0':  # Assuming there is no problem at index column 0
        for row_val in df[col_num]:
            if row_val != 'nan':
                if col_num != row_val[:1]:  # Comparing column number with sliced value
                    row = df[df[col_num] == row_val].index.values  # on true get row index as we already know column #
                    print("Found at column {0} and row {1}".format(col_num, row))
                    r_value = df.loc[row, str(row_val[:1])].values  # capturing value on target location
                    print("target location value", r_value)
                    # print("target location value", r_value[0][:1])
                    df.at[row, str(r_value[0][:1])] = r_value  # shifting target location's value to its correct loc
                    df.at[row, str(row_val[:1])] = row_val  # Shift to appropriate column
                    df.at[row, col_num] = 'NaN'  # finally update that cell to NaN

print(df)
输出:

   0        1        2        3        4
0  1  1:-0.55   2:0.25  3:-0.86    4:-91
1  1  1:-0.57   2:0.26  3:-0.87  4:-0.92
2  1  1:-0.57  3:-0.89  4:-0.93      NaN
3  1  1:-0.58   2:0.25  3:-0.88  4:-0.99
Found at column 2 and row [2]
target location value ['4:-0.93']
   0        1       2        3        4
0  1  1:-0.55  2:0.25  3:-0.86    4:-91
1  1  1:-0.57  2:0.26  3:-0.87  4:-0.92
2  1  1:-0.57     NaN  3:-0.89  4:-0.93
3  1  1:-0.58  2:0.25  3:-0.88  4:-0.99

Process finished with exit code 0

我同意Greg的建议,即在将数据集读入数据帧之前先清理数据集,但如果您希望在不匹配的列上使用shift值,那么您可以尝试下面的方法

input.csv

1,1:-0.55,2:0.25,3:-0.86,4:-91
1,1:-0.57,2:0.26,3:-0.87,4:-0.92
1,1:-0.57,3:-0.89,4:-0.93,NaN
1,1:-0.58,2:0.25,3:-0.88,4:-0.99
特定索引代码处的移位

import pandas as pd
df = pd.read_csv('files/60009536-input.csv')
print(df)

for col_num in df.columns:
    if col_num > '0':  # Assuming there is no problem at index column 0
        for row_val in df[col_num]:
            if row_val != 'nan':
                if col_num != row_val[:1]:  # Comparing column number with sliced value
                    row = df[df[col_num] == row_val].index.values  # on true get row index as we already know column #
                    print("Found at column {0} and row {1}".format(col_num, row))
                    r_value = df.loc[row, str(row_val[:1])].values  # capturing value on target location
                    print("target location value", r_value)
                    # print("target location value", r_value[0][:1])
                    df.at[row, str(r_value[0][:1])] = r_value  # shifting target location's value to its correct loc
                    df.at[row, str(row_val[:1])] = row_val  # Shift to appropriate column
                    df.at[row, col_num] = 'NaN'  # finally update that cell to NaN

print(df)
输出:

   0        1        2        3        4
0  1  1:-0.55   2:0.25  3:-0.86    4:-91
1  1  1:-0.57   2:0.26  3:-0.87  4:-0.92
2  1  1:-0.57  3:-0.89  4:-0.93      NaN
3  1  1:-0.58   2:0.25  3:-0.88  4:-0.99
Found at column 2 and row [2]
target location value ['4:-0.93']
   0        1       2        3        4
0  1  1:-0.55  2:0.25  3:-0.86    4:-91
1  1  1:-0.57  2:0.26  3:-0.87  4:-0.92
2  1  1:-0.57     NaN  3:-0.89  4:-0.93
3  1  1:-0.58  2:0.25  3:-0.88  4:-0.99

Process finished with exit code 0

如何填充数据帧?它是从文件中读取的吗?如果是这样的话,你能不能从上述文件中包括几行?我已经编辑了我的问题,以包括进一步的信息!如何填充数据帧?它是从文件中读取的吗?如果是这样的话,你能不能从上述文件中包括几行?我已经编辑了我的问题,以包括进一步的信息!