Python 如何从CSV文件中正确创建数据帧?

Python 如何从CSV文件中正确创建数据帧?,python,pandas,dataframe,Python,Pandas,Dataframe,我刚开始编程,在正确导入CSV文件时遇到了一些问题 要导入它,我使用以下代码: data_fundamentals = open(path_fundamentals, newline= '') reader_fundamentals = csv.reader(data_fundamentals) header_fundamentals = next(reader_fundamentals) fundamentals = [row for row in reader_fundamentals]

我刚开始编程,在正确导入CSV文件时遇到了一些问题

要导入它,我使用以下代码:

data_fundamentals = open(path_fundamentals, newline= '')
reader_fundamentals = csv.reader(data_fundamentals)
header_fundamentals = next(reader_fundamentals)
fundamentals = [row for row in reader_fundamentals] 
df_kennzahlen['Net Income/Loss'] = pd.to_numeric(df_kennzahlen['Net Income/Loss'], downcast='integer')
df_kennzahlen['Total Liabilities'] = pd.to_numeric(df_kennzahlen['Total Liabilities'], downcast='integer')
df_kennzahlen['Long-Term Debt'] = pd.to_numeric(df_kennzahlen['Long-Term Debt'], downcast='integer')
df_kennzahlen['Cash'] = pd.to_numeric(df_kennzahlen['Cash'], downcast='integer')
df_kennzahlen['Total Assets'] = pd.to_numeric(df_kennzahlen['Total Assets'], downcast='integer')
df_kennzahlen['Trade Payables'] = pd.to_numeric(df_kennzahlen['Trade Payables'], downcast='integer')
df_kennzahlen['R&D-Expenses'] = pd.to_numeric(df_kennzahlen['R&D-Expenses'], downcast='integer')
df_kennzahlen['Sales'] = pd.to_numeric(df_kennzahlen['Sales'], downcast='integer') 
然后将其转换为数据帧:

df_fundamentals = pd.DataFrame(fundamentals, columns= header_fundamentals)

我的第一个问题来了:在CSV文件“基础知识”中,我只需要为我的数据帧设置某些列。我一开始都是手工插入,这当然不是很有效。你有更简单的方法吗

df_kennzahlen.insert(1, 'Fiscal Year' , df_fundamentals['fyear'])
df_kennzahlen.insert(2, 'Current Assets' , df_fundamentals['act'])
df_kennzahlen.insert(3, 'Net Income/Loss' , df_fundamentals['ni'])
df_kennzahlen.insert(4, 'Total Liabilities' , df_fundamentals['lt'])
df_kennzahlen.insert(5, 'Long-Term Debt' , df_fundamentals['dltp'])
df_kennzahlen.insert(6, 'Cash' , df_fundamentals['ch'])
df_kennzahlen.insert(7, 'Total Assets' , df_fundamentals['at'])
df_kennzahlen.insert(8, 'Trade Payables' , df_fundamentals['ap'])
df_kennzahlen.insert(9, 'R&D-Expenses' , df_fundamentals['xrd'])
df_kennzahlen.insert(10, 'Sales' , df_fundamentals['sale'])

数据帧中的值是数字,但具有字符串数据类型。要转换它们,我使用以下代码:

data_fundamentals = open(path_fundamentals, newline= '')
reader_fundamentals = csv.reader(data_fundamentals)
header_fundamentals = next(reader_fundamentals)
fundamentals = [row for row in reader_fundamentals] 
df_kennzahlen['Net Income/Loss'] = pd.to_numeric(df_kennzahlen['Net Income/Loss'], downcast='integer')
df_kennzahlen['Total Liabilities'] = pd.to_numeric(df_kennzahlen['Total Liabilities'], downcast='integer')
df_kennzahlen['Long-Term Debt'] = pd.to_numeric(df_kennzahlen['Long-Term Debt'], downcast='integer')
df_kennzahlen['Cash'] = pd.to_numeric(df_kennzahlen['Cash'], downcast='integer')
df_kennzahlen['Total Assets'] = pd.to_numeric(df_kennzahlen['Total Assets'], downcast='integer')
df_kennzahlen['Trade Payables'] = pd.to_numeric(df_kennzahlen['Trade Payables'], downcast='integer')
df_kennzahlen['R&D-Expenses'] = pd.to_numeric(df_kennzahlen['R&D-Expenses'], downcast='integer')
df_kennzahlen['Sales'] = pd.to_numeric(df_kennzahlen['Sales'], downcast='integer') 

我也有同样的问题,它不是很有效,数据帧中的值没有正确转换。例如,4680显示为0.4680,3235300显示为323.530。您知道如何提高代码的效率并在数据帧中具有正确的值吗?

您可以通过usecols参数将所需的列作为列表传递

import pandas as pd
df=pd.read_csv(filename,header=0,usecols=['a','b'],converters={'a': str, 'b': str})

使用
pd.read\u csv
功能,您可以精确指定如何读取csv文件。特别是,您可以选择列(usecols param)、解析日期列(parse_dates param)、更改默认分隔符(例如,sep=“;”),更改小数和千分隔符(例如,decimal=“,”,数千=“.”)。在使用非默认CSV时,最后两项特别有用


有关参数的完整列表,请参阅;df=pandas.read_csv(“csv_file.csv”)好的,那么如何仅获取所需的列?导入pandas;这能回答你的问题吗?太好了,谢谢。您对如何转换值也有什么建议吗?您可以使用Pandas的read_csv函数中的dtype或converters选项-