Python 熊猫按行中的值过滤?
所以我试图通过一行中的值来过滤熊猫数据帧。 基本上我有一个df,其中一行包含建筑物的名称,例如教育、K-12、办公室、教堂等 我想根据这些值过滤一个新的数据帧。 我想“提取”单元格值等于“Education,K-12”的列。 我该怎么做 我到处搜索,但大多数链式筛选似乎都是基于列值的。这不应基于列值 谢谢Python 熊猫按行中的值过滤?,python,pandas,filtering,Python,Pandas,Filtering,所以我试图通过一行中的值来过滤熊猫数据帧。 基本上我有一个df,其中一行包含建筑物的名称,例如教育、K-12、办公室、教堂等 我想根据这些值过滤一个新的数据帧。 我想“提取”单元格值等于“Education,K-12”的列。 我该怎么做 我到处搜索,但大多数链式筛选似乎都是基于列值的。这不应基于列值 谢谢 SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3 \ 0 Com
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3 \
0 Commercial Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Education, K-12 Education, K-12
.. ... ... ... ...
SAN ANTONIO, TX.429 SAN ANTONIO, TX.430 SAN ANTONIO, TX.431
0 Commercial Commercial Commercial
1 Electric Electric Electric
2 Office, Large Office, Large Office, Large
.. ... ... ...
[745 rows x 432 columns]>
我的第一个想法是转置数据帧
transposed = dt.T
要获得“教育”,请在列中输入K-12
0 1 2
SAN ANTONIO, TX Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.1 Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.2 Commercial Fossil Fuel Office, Large
SAN ANTONIO, TX.3 Commercial Fossil Fuel Education, K-12
然后按行搜索
transposed[ transposed[2] == 'Education, K-12' ].index
最小工作示例
我使用io.StringIO只是为了模拟内存中的文件,但您应该使用普通文件名
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
transposed = df.T
print('\n--- transposed ---\n')
print(transposed)
print('\n--- names ---\n')
cols = transposed[ transposed[2] == 'Education, K-12' ].index
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
print('\n--- Series ---\n')
print( df.iloc[2] )
print('\n--- mask ---\n')
print( df.iloc[2] == 'Education, K-12' )
print('\n--- names ---\n')
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
结果
--- df ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Office, Large Education, K-12
--- transposed ---
0 1 2
SAN ANTONIO, TX Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.1 Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.2 Commercial Fossil Fuel Office, Large
SAN ANTONIO, TX.3 Commercial Fossil Fuel Education, K-12
--- names ---
Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')
--- columns ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Education, K-12
我的第一个想法是转置数据帧
transposed = dt.T
要获得“教育”,请在列中输入K-12
0 1 2
SAN ANTONIO, TX Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.1 Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.2 Commercial Fossil Fuel Office, Large
SAN ANTONIO, TX.3 Commercial Fossil Fuel Education, K-12
然后按行搜索
transposed[ transposed[2] == 'Education, K-12' ].index
最小工作示例
我使用io.StringIO只是为了模拟内存中的文件,但您应该使用普通文件名
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
transposed = df.T
print('\n--- transposed ---\n')
print(transposed)
print('\n--- names ---\n')
cols = transposed[ transposed[2] == 'Education, K-12' ].index
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
print('\n--- Series ---\n')
print( df.iloc[2] )
print('\n--- mask ---\n')
print( df.iloc[2] == 'Education, K-12' )
print('\n--- names ---\n')
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
结果
--- df ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Office, Large Education, K-12
--- transposed ---
0 1 2
SAN ANTONIO, TX Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.1 Commercial Fossil Fuel Education, K-12
SAN ANTONIO, TX.2 Commercial Fossil Fuel Office, Large
SAN ANTONIO, TX.3 Commercial Fossil Fuel Education, K-12
--- names ---
Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')
--- columns ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Education, K-12
我以前从未见过这个用例。我想不出一个优雅的方法来实现这一点,但是您可以先转置数据帧,然后选择您想要的行,然后再转置回来 在下面的示例中,我将第7行设置为包含要过滤的内容的行。假设您想去掉第7行中有“c”的列。所以基本上我们需要去除“col2”
>>> col1=['a','a','a','a','a','b','b','b']
>>> col2=['a','a','a','a','a','b','b','c']
>>> cols=['col1','col2']
>>> values=zip(col1,col2)
>>> import pandas as pd
>>> df=pd.DataFrame(data=values,columns=cols)
>>> df
col1 col2
0 a a
1 a a
2 a a
3 a a
4 a a
5 b b
6 b b
7 b c
>>> dft=df.T
>>> dft
0 1 2 3 4 5 6 7
col1 a a a a a b b b
col2 a a a a a b b c
>>> dff=dft[dft[7]!='c']
>>> dff
0 1 2 3 4 5 6 7
col1 a a a a a b b b
>>> dfo=dff.T
>>> dfo
col1
0 a
1 a
2 a
3 a
4 a
5 b
6 b
7 b
我以前从未见过这个用例。我想不出一个优雅的方法来实现这一点,但是您可以先转置数据帧,然后选择您想要的行,然后再转置回来 在下面的示例中,我将第7行设置为包含要过滤的内容的行。假设您想去掉第7行中有“c”的列。所以基本上我们需要去除“col2”
>>> col1=['a','a','a','a','a','b','b','b']
>>> col2=['a','a','a','a','a','b','b','c']
>>> cols=['col1','col2']
>>> values=zip(col1,col2)
>>> import pandas as pd
>>> df=pd.DataFrame(data=values,columns=cols)
>>> df
col1 col2
0 a a
1 a a
2 a a
3 a a
4 a a
5 b b
6 b b
7 b c
>>> dft=df.T
>>> dft
0 1 2 3 4 5 6 7
col1 a a a a a b b b
col2 a a a a a b b c
>>> dff=dft[dft[7]!='c']
>>> dff
0 1 2 3 4 5 6 7
col1 a a a a a b b b
>>> dfo=dff.T
>>> dfo
col1
0 a
1 a
2 a
3 a
4 a
5 b
6 b
7 b
在测试了这些想法之后,我创建了这个
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
df[ cols ]
我只得到一行iloc[2],所以我得到了序列,我可以将序列中的值与“Education,K-12”进行比较,这将为此行中的每个项目提供真/假值,我可以使用它来筛选列
最少的工作示例
我使用io.StringIO只是为了模拟内存中的文件,但您应该使用普通文件名
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
transposed = df.T
print('\n--- transposed ---\n')
print(transposed)
print('\n--- names ---\n')
cols = transposed[ transposed[2] == 'Education, K-12' ].index
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
print('\n--- Series ---\n')
print( df.iloc[2] )
print('\n--- mask ---\n')
print( df.iloc[2] == 'Education, K-12' )
print('\n--- names ---\n')
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
结果:
--- df ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Office, Large Education, K-12
--- Series ---
SAN ANTONIO, TX Education, K-12
SAN ANTONIO, TX.1 Education, K-12
SAN ANTONIO, TX.2 Office, Large
SAN ANTONIO, TX.3 Education, K-12
Name: 2, dtype: object
--- mask ---
SAN ANTONIO, TX True
SAN ANTONIO, TX.1 True
SAN ANTONIO, TX.2 False
SAN ANTONIO, TX.3 True
Name: 2, dtype: bool
--- names ---
Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')
--- columns ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Education, K-12
在测试了这些想法之后,我创建了这个
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
df[ cols ]
我只得到一行iloc[2],所以我得到了序列,我可以将序列中的值与“Education,K-12”进行比较,这将为此行中的每个项目提供真/假值,我可以使用它来筛选列
最少的工作示例
我使用io.StringIO只是为了模拟内存中的文件,但您应该使用普通文件名
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
transposed = df.T
print('\n--- transposed ---\n')
print(transposed)
print('\n--- names ---\n')
cols = transposed[ transposed[2] == 'Education, K-12' ].index
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
text = '''SAN ANTONIO, TX;SAN ANTONIO, TX.1;SAN ANTONIO, TX.2;SAN ANTONIO, TX.3
Commercial;Commercial;Commercial;Commercial
Fossil Fuel;Fossil Fuel;Fossil Fuel;Fossil Fuel
Education, K-12;Education, K-12;Office, Large;Education, K-12'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep=';')
print('\n--- df ---\n')
print(df)
print('\n--- Series ---\n')
print( df.iloc[2] )
print('\n--- mask ---\n')
print( df.iloc[2] == 'Education, K-12' )
print('\n--- names ---\n')
cols = df.columns[ df.iloc[2] == 'Education, K-12' ]
print(cols)
print('\n--- columns ---\n')
print(df[ cols ])
结果:
--- df ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.2 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Office, Large Education, K-12
--- Series ---
SAN ANTONIO, TX Education, K-12
SAN ANTONIO, TX.1 Education, K-12
SAN ANTONIO, TX.2 Office, Large
SAN ANTONIO, TX.3 Education, K-12
Name: 2, dtype: object
--- mask ---
SAN ANTONIO, TX True
SAN ANTONIO, TX.1 True
SAN ANTONIO, TX.2 False
SAN ANTONIO, TX.3 True
Name: 2, dtype: bool
--- names ---
Index(['SAN ANTONIO, TX', 'SAN ANTONIO, TX.1', 'SAN ANTONIO, TX.3'], dtype='object')
--- columns ---
SAN ANTONIO, TX SAN ANTONIO, TX.1 SAN ANTONIO, TX.3
0 Commercial Commercial Commercial
1 Fossil Fuel Fossil Fuel Fossil Fuel
2 Education, K-12 Education, K-12 Education, K-12
第一个想法:将其转换为搜索行而不是列。第一个想法:将其转换为搜索行而不是列。这正是我所寻找的,而且非常有意义。谢谢。这正是我要找的,很有道理。非常感谢。