Python 从数据帧中删除列_Python_Pandas_Dataframe

Python 从数据帧中删除列

python pandas dataframe

Python 从数据帧中删除列,python,pandas,dataframe,Python,Pandas,Dataframe,删除数据帧中的列时，我使用： del df['column_name'] 这很有效。为什么我不能使用以下命令 del df.column_name 由于可以以df.column\u name的形式访问列/系列，因此我希望这会起作用。始终使用[]符号是一种良好的做法。一个原因是属性表示法（df.column\u name）不适用于编号索引： In [1]: df = DataFrame([[1, 2, 3], [4, 5, 6]]) In [2]: df[1] Out[2]: 0 2

删除数据帧中的列时，我使用：

del df['column_name']

这很有效。为什么我不能使用以下命令

del df.column_name

由于可以以

df.column\u name

的形式访问列/系列，因此我希望这会起作用。

始终使用

[]

符号是一种良好的做法。一个原因是属性表示法（

df.column\u name

）不适用于编号索引：

In [1]: df = DataFrame([[1, 2, 3], [4, 5, 6]])

In [2]: df[1]
Out[2]:
0    2
1    5
Name: 1

In [3]: df.1
  File "<ipython-input-3-e4803c0d1066>", line 1
    df.1
       ^
SyntaxError: invalid syntax

[1]中的

：df=DataFrame（[1,2,3]，[4,5,6]）
In[2]：df[1]
出[2]：
0    2
1    5
姓名:1
In[3]：df.1
文件“”，第1行
df.1
^
SyntaxError:无效语法

正如您所猜测的，正确的语法是

del df['column_name']

仅仅由于Python中的语法限制，很难让

del df.column\u name

工作

del df[name]

被翻译成

df.\uu delitem\uuuu（name）

在Python的封面下。

在熊猫中，最好的方法是使用：

其中

是轴编号（

表示行，

表示列。）

要删除该列而不必重新分配

df

，可以执行以下操作：

df.drop('column_name', axis=1, inplace=True)

最后，要按列编号而不是按列标签删除，请尝试删除，例如，第1、第2和第4列：

df = df.drop(df.columns[[0, 1, 3]], axis=1)  # df.columns is zero-based pd.Index

df.drop(df.columns[[0,1,3]], axis=1, inplace=True)

同时使用列的“文本”语法：

df.drop(['column_nameA', 'column_nameB'], axis=1, inplace=True)

注意：在（2017年10月27日）中引入的drop（）方法接受index/columns关键字作为指定轴的替代方法

所以我们现在可以做：

df.drop（列=['B'，'C']）

使用：

columns = ['Col1', 'Col2', ...]
df.drop(columns, inplace=True, axis=1)

这将在位删除一个或多个列。请注意，

inplace=True

是在pandas v0.13中添加的，不适用于旧版本。在这种情况下，您必须重新分配结果：

df = df.drop(columns, axis=1)

逐项索引删除第一、第二和第四列：

df = df.drop(df.columns[[0, 1, 3]], axis=1)  # df.columns is zero-based pd.Index

df.drop(df.columns[[0,1,3]], axis=1, inplace=True)

删除第一列：

df.drop(df.columns[[0]], axis=1, inplace=True)

此处有一个可选参数

，

，以便原始可以在不创建副本的情况下修改数据

被逮捕的

删除列

列名称

：

df.pop('column-name')

示例：

打印df

：

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9

   two  three
A    2      3
B    5      6
C    8      9

   two
A    2
B    5
C    8

df.drop（df.columns[[0]]，axis=1，inplace=True）

打印df

：

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9

   two  three
A    2      3
B    5      6
C    8      9

   two
A    2
B    5
C    8

three=df.pop（'three'）

打印df

：

   one  two  three
A    1    2      3
B    4    5      6
C    7    8      9

   two  three
A    2      3
B    5      6
C    8      9

   two
A    2
B    5
C    8

一个很好的补充是，只有列存在时才能删除列。这样，您可以覆盖更多的用例，并且只会从传递给它的标签中删除现有列：
只需添加errors='ignore'，例如：

df.drop(['col_name_1', 'col_name_2', ..., 'col_name_N'], inplace=True, axis=1, errors='ignore')

这是从0.16.1开始的新版本。文件是

在Pandas 0.16.1+中，仅当列存在时，才可以删除列。在该版本之前，您可以通过条件列表理解获得相同的结果：

df.drop([col for col in ['col_name_1','col_name_2',...,'col_name_N'] if col in df], axis=1, inplace=True)

点语法可以在JavaScript中使用，但不能在Python中使用

Python:
deldf['column\u name']

JavaScript:
del-df['column\u-name']
或
del-df.column\u-name

从0.16.1版开始，您可以

df.drop(['column_name'], axis = 1, inplace = True, errors = 'ignore')

这里的大多数答案都没有提到的实际问题是：
为什么我不能使用
del df.column\u name
？首先，我们需要理解这个问题，这需要我们深入研究
在他的回答中，
deldf['column']
映射到Python神奇的方法
df.\uu delitem.\uuu（'column'）
，它是
然而，正如上面链接中所指出的：

事实上，
\uuu del\uuu
几乎不应该被使用，因为调用它的环境不稳定；小心使用
你可以争辩说不应该使用或鼓励
del df['column_name']
，因此甚至不应该考虑
del df.column_name
然而，从理论上讲，
del df.column\u name
可以实现为在Pandas中使用。然而，这确实会带来某些问题，
deldf['column\u name']
实现已经存在的问题，但程度较低
示例问题如果我在数据帧中定义一个名为“dtypes”或“columns”的列，该怎么办
然后假设我想删除这些列

del df.dtypes
会使
\uu delattr\uuu
方法变得混乱，好像它应该删除“dtypes”属性或“dtypes”列一样
这个问题背后的架构问题
数据帧是列的集合吗

数据帧是行的集合吗

列是数据帧的属性吗
答案:
是的，在各个方面

不需要，但如果您希望，可以使用
.ix
、
.loc
或
.iloc
方法

也许，你想读取数据吗？然后是，除非属于数据帧的另一个属性已经使用了该属性的名称。是否要修改数据？然后否
太长，读不下去了你不能做del df.column\u name，因为Pandas有一个非常广泛的架构，需要重新考虑，以避免用户出现这种认知失调
专业提示：不要使用df.column\u名称。它可能很漂亮，但会导致认知失调
Python的禅宗格言适用于此：有多种删除列的方法
应该有一个——最好只有一个——显而易见的方法来做到这一点
列有时是属性，但有时不是
特殊情况不足以违反规则

del df.dtypes
是否删除了dtypes属性或dtypes列
面对模棱两可的情况，拒绝猜测的诱惑
TL；博士为了找到稍微更有效的解决方案，我们付出了很多努力。在牺牲df.drop（dlst，1，errors='ignore'）
序言
删除列在语义上与选择其他列相同。我将展示一些额外的方法来考虑。我也会集中注意力
array(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype=object)

Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')

# does not preserve order ['E', 'D', 'B', 'F', 'G', 'A', 'C']

['A', 'B', 'C', 'D', 'E', 'F', 'G']

cols = [x for x in df.columns.values.tolist() if x not in dlst]

A B C D E F G 0 1 2 3 4 5 6 7 1 1 2 3 4 5 6 7 2 1 2 3 4 5 6 7

bools = [x not in dlst for x in df.columns.values.tolist()]

A B C D E F G 0 1 2 3 4 5 6 7 1 1 2 3 4 5 6 7 2 1 2 3 4 5 6 7

setdiff1d = lambda df, dlst: np.setdiff1d(df.columns.values, dlst) difference = lambda df, dlst: df.columns.difference(dlst) columndrop = lambda df, dlst: df.columns.drop(dlst, errors='ignore') setdifflst = lambda df, dlst: list(set(df.columns.values.tolist()).difference(dlst)) comprehension = lambda df, dlst: [x for x in df.columns.values.tolist() if x not in dlst] loc = lambda df, cols: df.loc[:, cols] slc = lambda df, cols: df[cols] ridx = lambda df, cols: df.reindex(columns=cols) ridxa = lambda df, cols: df.reindex_axis(cols, 1) isin = lambda df, dlst: ~df.columns.isin(dlst) in1d = lambda df, dlst: ~np.in1d(df.columns.values, dlst) comp = lambda df, dlst: [x not in dlst for x in df.columns.values.tolist()] brod = lambda df, dlst: (df.columns.values[:, None] != dlst).all(1)

res1 = pd.DataFrame( index=pd.MultiIndex.from_product([ 'loc slc ridx ridxa'.split(), 'setdiff1d difference columndrop setdifflst comprehension'.split(), ], names=['Select', 'Label']), columns=[10, 30, 100, 300, 1000], dtype=float ) res2 = pd.DataFrame( index=pd.MultiIndex.from_product([ 'loc'.split(), 'isin in1d comp brod'.split(), ], names=['Select', 'Label']), columns=[10, 30, 100, 300, 1000], dtype=float ) res = res1.append(res2).sort_index() dres = pd.Series(index=res.columns, name='drop') for j in res.columns: dlst = list(range(j)) cols = list(range(j // 2, j + j // 2)) d = pd.DataFrame(1, range(10), cols) dres.at[j] = timeit('d.drop(dlst, 1, errors="ignore")', 'from __main__ import d, dlst', number=100) for s, l in res.index: stmt = '{}(d, {}(d, dlst))'.format(s, l) setp = 'from __main__ import d, dlst, {}, {}'.format(s, l) res.at[(s, l), j] = timeit(stmt, setp, number=100) rs = res / dres

rs 10 30 100 300 1000 Select Label loc brod 0.747373 0.861979 0.891144 1.284235 3.872157 columndrop 1.193983 1.292843 1.396841 1.484429 1.335733 comp 0.802036 0.732326 1.149397 3.473283 25.565922 comprehension 1.463503 1.568395 1.866441 4.421639 26.552276 difference 1.413010 1.460863 1.587594 1.568571 1.569735 in1d 0.818502 0.844374 0.994093 1.042360 1.076255 isin 1.008874 0.879706 1.021712 1.001119 0.964327 setdiff1d 1.352828 1.274061 1.483380 1.459986 1.466575 setdifflst 1.233332 1.444521 1.714199 1.797241 1.876425 ridx columndrop 0.903013 0.832814 0.949234 0.976366 0.982888 comprehension 0.777445 0.827151 1.108028 3.473164 25.528879 difference 1.086859 1.081396 1.293132 1.173044 1.237613 setdiff1d 0.946009 0.873169 0.900185 0.908194 1.036124 setdifflst 0.732964 0.823218 0.819748 0.990315 1.050910 ridxa columndrop 0.835254 0.774701 0.907105 0.908006 0.932754 comprehension 0.697749 0.762556 1.215225 3.510226 25.041832 difference 1.055099 1.010208 1.122005 1.119575 1.383065 setdiff1d 0.760716 0.725386 0.849949 0.879425 0.946460 setdifflst 0.710008 0.668108 0.778060 0.871766 0.939537 slc columndrop 1.268191 1.521264 2.646687 1.919423 1.981091 comprehension 0.856893 0.870365 1.290730 3.564219 26.208937 difference 1.470095 1.747211 2.886581 2.254690 2.050536 setdiff1d 1.098427 1.133476 1.466029 2.045965 3.123452 setdifflst 0.833700 0.846652 1.013061 1.110352 1.287831

fig, axes = plt.subplots(2, 2, figsize=(8, 6), sharey=True) for i, (n, g) in enumerate([(n, g.xs(n)) for n, g in rs.groupby('Select')]): ax = axes[i // 2, i % 2] g.plot.bar(ax=ax, title=n) ax.legend_.remove() fig.tight_layout()

rs.idxmin().pipe( lambda x: pd.DataFrame( dict(idx=x.values, val=rs.lookup(x.values, x.index)), x.index ) ) idx val 10 (ridx, setdifflst) 0.653431 30 (ridxa, setdifflst) 0.746143 100 (ridxa, setdifflst) 0.816207 300 (ridx, setdifflst) 0.780157 1000 (ridxa, setdifflst) 0.861622

df.drop(columns=['column_a', 'column_c'])

my_dict = { 'name' : ['a','b','c','d'], 'age' : [10,20,25,22], 'designation' : ['CEO', 'VP', 'MD', 'CEO']} df = pd.DataFrame(my_dict)

newdf = pd.DataFrame(df, columns=['name', 'age'])

new_df = df[['spam', 'sausage']]

df = df.drop(column0, axis=1)

df = df.drop([col1, col2, . . . , coln], axis=1)

df.drop('columnname', axis =1, inplace = True)

del df['colname']

df.drop(df.iloc[:,1:3], axis = 1, inplace = True)

df.drop(['col1','col2',..'coln'], axis = 1, inplace = True)

df = df.iloc[:,1:] # Removing an unnamed index column