Python dataframe,如果记录包含特定值,则将其移动到另一列
我有以下数据: 例如,在第2行中,我想将所有的“3:xxx”移到第3列,将所有的“4:xxx”移到第4列。我该怎么做 顺便说一句,我试过这个,但它不起作用:Python dataframe,如果记录包含特定值,则将其移动到另一列,python,dataframe,Python,Dataframe,我有以下数据: 例如,在第2行中,我想将所有的“3:xxx”移到第3列,将所有的“4:xxx”移到第4列。我该怎么做 顺便说一句,我试过这个,但它不起作用: df[3] = np.where((df[2].str.contains('3:'))) 数据集加载: url = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale' df = pd.read_csv(url,header=None
df[3] = np.where((df[2].str.contains('3:')))
数据集加载:
url = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale'
df = pd.read_csv(url,header=None,delim_whitespace=True)
我认为最简单的方法是在将数据集读入数据帧之前清理数据集。查看数据源时,似乎有一些行缺少字段,即:
# (Missing the 3's field)
'1 1:-0.611111 2:0.166667508 4:-0.916667'
所以我会在读取之前清理文件。对于此行,您可以在2:0.166667508和4:0.916667之间添加一个额外的空格,以表示空的第三列:
'1 1:-0.611111 2:0.166667508 4:-0.916667 '.split(' ')
# ['1', '1:-0.611111', '2:0.166667508', '4:-0.916667', '']
'1 1:-0.611111 2:0.166667508 4:-0.916667 '.split(' ')
# ['1', '1:-0.611111', '2:0.166667508', '', '4:-0.916667', '']
我认为最简单的方法是在将数据集读入数据帧之前清理数据集。查看数据源时,似乎有一些行缺少字段,即:
# (Missing the 3's field)
'1 1:-0.611111 2:0.166667508 4:-0.916667'
所以我会在读取之前清理文件。对于此行,您可以在2:0.166667508和4:0.916667之间添加一个额外的空格,以表示空的第三列:
'1 1:-0.611111 2:0.166667508 4:-0.916667 '.split(' ')
# ['1', '1:-0.611111', '2:0.166667508', '4:-0.916667', '']
'1 1:-0.611111 2:0.166667508 4:-0.916667 '.split(' ')
# ['1', '1:-0.611111', '2:0.166667508', '', '4:-0.916667', '']
我同意Greg的建议,即在将数据集读入数据帧之前先清理数据集,但如果您希望在不匹配的列上使用shift值,那么您可以尝试下面的方法 input.csv
1,1:-0.55,2:0.25,3:-0.86,4:-91
1,1:-0.57,2:0.26,3:-0.87,4:-0.92
1,1:-0.57,3:-0.89,4:-0.93,NaN
1,1:-0.58,2:0.25,3:-0.88,4:-0.99
特定索引代码处的移位
import pandas as pd
df = pd.read_csv('files/60009536-input.csv')
print(df)
for col_num in df.columns:
if col_num > '0': # Assuming there is no problem at index column 0
for row_val in df[col_num]:
if row_val != 'nan':
if col_num != row_val[:1]: # Comparing column number with sliced value
row = df[df[col_num] == row_val].index.values # on true get row index as we already know column #
print("Found at column {0} and row {1}".format(col_num, row))
r_value = df.loc[row, str(row_val[:1])].values # capturing value on target location
print("target location value", r_value)
# print("target location value", r_value[0][:1])
df.at[row, str(r_value[0][:1])] = r_value # shifting target location's value to its correct loc
df.at[row, str(row_val[:1])] = row_val # Shift to appropriate column
df.at[row, col_num] = 'NaN' # finally update that cell to NaN
print(df)
输出:
0 1 2 3 4
0 1 1:-0.55 2:0.25 3:-0.86 4:-91
1 1 1:-0.57 2:0.26 3:-0.87 4:-0.92
2 1 1:-0.57 3:-0.89 4:-0.93 NaN
3 1 1:-0.58 2:0.25 3:-0.88 4:-0.99
Found at column 2 and row [2]
target location value ['4:-0.93']
0 1 2 3 4
0 1 1:-0.55 2:0.25 3:-0.86 4:-91
1 1 1:-0.57 2:0.26 3:-0.87 4:-0.92
2 1 1:-0.57 NaN 3:-0.89 4:-0.93
3 1 1:-0.58 2:0.25 3:-0.88 4:-0.99
Process finished with exit code 0
我同意Greg的建议,即在将数据集读入数据帧之前先清理数据集,但如果您希望在不匹配的列上使用shift值,那么您可以尝试下面的方法 input.csv
1,1:-0.55,2:0.25,3:-0.86,4:-91
1,1:-0.57,2:0.26,3:-0.87,4:-0.92
1,1:-0.57,3:-0.89,4:-0.93,NaN
1,1:-0.58,2:0.25,3:-0.88,4:-0.99
特定索引代码处的移位
import pandas as pd
df = pd.read_csv('files/60009536-input.csv')
print(df)
for col_num in df.columns:
if col_num > '0': # Assuming there is no problem at index column 0
for row_val in df[col_num]:
if row_val != 'nan':
if col_num != row_val[:1]: # Comparing column number with sliced value
row = df[df[col_num] == row_val].index.values # on true get row index as we already know column #
print("Found at column {0} and row {1}".format(col_num, row))
r_value = df.loc[row, str(row_val[:1])].values # capturing value on target location
print("target location value", r_value)
# print("target location value", r_value[0][:1])
df.at[row, str(r_value[0][:1])] = r_value # shifting target location's value to its correct loc
df.at[row, str(row_val[:1])] = row_val # Shift to appropriate column
df.at[row, col_num] = 'NaN' # finally update that cell to NaN
print(df)
输出:
0 1 2 3 4
0 1 1:-0.55 2:0.25 3:-0.86 4:-91
1 1 1:-0.57 2:0.26 3:-0.87 4:-0.92
2 1 1:-0.57 3:-0.89 4:-0.93 NaN
3 1 1:-0.58 2:0.25 3:-0.88 4:-0.99
Found at column 2 and row [2]
target location value ['4:-0.93']
0 1 2 3 4
0 1 1:-0.55 2:0.25 3:-0.86 4:-91
1 1 1:-0.57 2:0.26 3:-0.87 4:-0.92
2 1 1:-0.57 NaN 3:-0.89 4:-0.93
3 1 1:-0.58 2:0.25 3:-0.88 4:-0.99
Process finished with exit code 0
如何填充数据帧?它是从文件中读取的吗?如果是这样的话,你能不能从上述文件中包括几行?我已经编辑了我的问题,以包括进一步的信息!如何填充数据帧?它是从文件中读取的吗?如果是这样的话,你能不能从上述文件中包括几行?我已经编辑了我的问题,以包括进一步的信息!