Python 使用Pandas从CSV中获取特定行和特定列_Python_Pandas_Csv

Python 使用Pandas从CSV中获取特定行和特定列

python pandas csv

Python 使用Pandas从CSV中获取特定行和特定列,python,pandas,csv,Python,Pandas,Csv,如何查找列中的特定值，然后在该行中选择列值的子集我有一个CSV，其中有一列是美国各州的名称，还有一列是关于每个州的属性，但我只想要关于我正在查找的州的特定值例如，有50行（针对50个州）和20列，其中包含关于每个州的各种数据，我想选择科罗拉多州和佛罗里达州，这些州的列值中只有5个以下是我要修改的代码： import glob import pandas as pd import os import csv myList = [] path = "/path/to/sour

如何查找列中的特定值，然后在该行中选择列值的子集

我有一个CSV，其中有一列是美国各州的名称，还有一列是关于每个州的属性，但我只想要关于我正在查找的州的特定值

例如，有50行（针对50个州）和20列，其中包含关于每个州的各种数据，我想选择科罗拉多州和佛罗里达州，这些州的列值中只有5个

以下是我要修改的代码：

import glob
import pandas as pd
import os
import csv
 
myList = []
 
path = "/path/to/source/files/*.csv"
 
for fname in glob.glob(path):
    df = pd.read_csv(fname)
    row = df.loc[df['Province_State'] == 'Pennsylvania']
 
    # Put the date in, derived from the CSV name
    dateFromFilename = os.path.basename(fname).replace('.csv','')
    row['Date'] = dateFromFilename
 
    myList.append(row)
    print(row)
 
concatList = pd.concat(myList, sort=True)
 
concatList.to_csv('/path/to/output.csv', index=False, header=True)

您只需使用

isin

并将列列表传递给

loc

：

myList = []

path = "/path/to/source/files/*.csv"

col_lists = ['col1','col2','col3']
 
for fname in glob.glob(path):
    df = pd.read_csv(fname)

    # changes here
    row = df.loc[df['Province_State'].isin('Florida', 'Colorado'),
                 col_list]

    # pivot
    row = (row.assign(idx=row.groupby('Province_State').cumcount()).     
              .pivot(index='idx', columns='Province_State')
          )
    # rename
    row.columns = [f'{x}_{y}' for x,y in row.columns]          

    # Put the date in, derived from the CSV name
    dateFromFilename = os.path.basename(fname).replace('.csv','')
    row['Date'] = dateFromFilename
 
    myList.append(row)
    print(row)
 
concatList = pd.concat(myList, sort=True)

当我抓取这些列时，如何重命名它们？因为对于每个状态，我将得到两个

col1

s。我需要以

Florida_col1

，

Colorado_col1

等结束。我想这里有一个语法错误，我已经修复了：

row=（row.assign（idx=row.groupby（'Province_State'）.cumcount（））.pivot（index='idx'，columns='Province_columns'）

，但我仍然在这里得到一个，我不明白：

row.columns=[f'{x}{y}y}'for x，y在row.columns中]

在该行之前

row

有两级列。有关详细信息，请参阅中的问题/Anwer 10/11。因此我需要用实际列名替换

x，y

？不，这就是for循环的用途。