在python中处理多个csv文件_Python_Python 3.x_Pandas_Csv - Fatal编程技术网

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中处理多个csv文件_Python_Python 3.x_Pandas_Csv - Fatal编程技术网

在python中处理多个csv文件

python python-3.x pandas csv

在python中处理多个csv文件,python,python-3.x,pandas,csv,Python,Python 3.x,Pandas,Csv,我有以下方式的多个csv文件。所有文件的格式都相同 | | items | per_unit_amount | number of units | |---:|:--------|------------------:|------------------:| | 0 | book | 25 | 5 | | 1 | pencil | 3 | 1

我有以下方式的多个

csv

文件。所有文件的格式都相同

|    | items   |   per_unit_amount |   number of units |
|---:|:--------|------------------:|------------------:|
|  0 | book    |                25 |                 5 |
|  1 | pencil  |                 3 |                10 |

首先，我想用python计算账单的总金额。计算总金额后，我需要同时计算所有

csv

文件的账单总金额，即以多线程方式

我需要使用多线程来实现这一点。

您可以使用

pandas

库来实现这一点。通过，

pip安装pandas

，安装pandas

工作流程应如下所示：

通过
```
glob
```
迭代文件名，使用pandas加载文件并将其保存在列表中
将数据帧列表合并为一个大数据帧
执行所需的计算

从全局导入全局
作为pd进口熊猫
#获取所有csv文件路径的列表
filenames=glob（'./*csv'）
#数据帧列表
dfs=[pd.read\u csv（文件名）表示文件名中的文件名]
#将所有数据帧合并为一个数据帧
big_df=pd.concat（dfs，ignore_index=True）

big_df

应该是这样的。在这里，我使用了两个csv文件和两行输入。因此，连接的数据帧总共有4行

|项目|每单位|金额|单位数量|
|---:|:--------|------------------:|------------------:|
|0 |册| 25 | 5|
|1 |铅笔| 3 | 10|
|2 |书| 25 | 5|
|3 |铅笔| 3 | 10|

现在让我们将

每单位金额

与

单位数

相乘，得到

单位总数

：

big_-df['unit_-total']=big_-df['per_-unit-amount']*big_-df['units']

现在，dataframe有一个额外的列：

|项目|每单位|金额|单位数|单位总数|
|---:|:--------|------------------:|------------------:|-------------:|
|0 |书本| 25 | 5 | 125|
|1 |铅笔| 3 | 10 | 30|
|2 |书| 25 | 5 | 125|
|3 |铅笔| 3 | 10 | 30|

您可以通过将

unit\u total

列中的所有条目相加来计算总数：

total_amount=big_df['unit_total'].sum（）

您可以使用

pandas

库来实现这一点。通过，

pip安装pandas

，安装pandas

工作流程应如下所示：

通过
```
glob
```
迭代文件名，使用pandas加载文件并将其保存在列表中
将数据帧列表合并为一个大数据帧
执行所需的计算

从全局导入全局
作为pd进口熊猫
#获取所有csv文件路径的列表
filenames=glob（'./*csv'）
#数据帧列表
dfs=[pd.read\u csv（文件名）表示文件名中的文件名]
#将所有数据帧合并为一个数据帧
big_df=pd.concat（dfs，ignore_index=True）

big_df

应该是这样的。在这里，我使用了两个csv文件和两行输入。因此，连接的数据帧总共有4行

|项目|每单位|金额|单位数量|
|---:|:--------|------------------:|------------------:|
|0 |册| 25 | 5|
|1 |铅笔| 3 | 10|
|2 |书| 25 | 5|
|3 |铅笔| 3 | 10|

现在让我们将

每单位金额

与

单位数

相乘，得到

单位总数

：

big_-df['unit_-total']=big_-df['per_-unit-amount']*big_-df['units']

现在，dataframe有一个额外的列：

|项目|每单位|金额|单位数|单位总数|
|---:|:--------|------------------:|------------------:|-------------:|
|0 |书本| 25 | 5 | 125|
|1 |铅笔| 3 | 10 | 30|
|2 |书| 25 | 5 | 125|
|3 |铅笔| 3 | 10 | 30|

您可以通过将

unit\u total

列中的所有条目相加来计算总数：

total_amount=big_df['unit_total'].sum（）

这是我的方式，首先合并所有CSV文件，然后对每个项目求和：

import glob
import os
import pandas as pd

# the path to your csv file directory
mycsvdir = 'C:\\your csv location\\your csv location'

#select all csv file you can have some kind of filter too
csvfiles = glob.glob(os.path.join(mycsvdir, '*.csv'))

# loop through the files and read them in with pandas
dataframes = []  # a list to hold all the individual pandas DataFrames
for csvfile in csvfiles:
    df = pd.read_csv(csvfile)
    dataframes.append(df)

# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)

# print out to a new csv file
result.to_csv('all.csv')

现在您有了所有的.csv文件，它是您的csv文件的合并。我们现在可以通过以下代码对任何项目求和：

dff = pd.read_csv('C:\\output folder\\output folder\\all.csv')


table = pd.pivot_table(dff, index =['items', 'per_unit_amount'])
print(table)

这是我的方式，首先合并所有CSV文件，然后对每个项目求和：

import glob
import os
import pandas as pd

# the path to your csv file directory
mycsvdir = 'C:\\your csv location\\your csv location'

#select all csv file you can have some kind of filter too
csvfiles = glob.glob(os.path.join(mycsvdir, '*.csv'))

# loop through the files and read them in with pandas
dataframes = []  # a list to hold all the individual pandas DataFrames
for csvfile in csvfiles:
    df = pd.read_csv(csvfile)
    dataframes.append(df)

# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)

# print out to a new csv file
result.to_csv('all.csv')

现在您有了所有的.csv文件，它是您的csv文件的合并。我们现在可以通过以下代码对任何项目求和：

dff = pd.read_csv('C:\\output folder\\output folder\\all.csv')


table = pd.pivot_table(dff, index =['items', 'per_unit_amount'])
print(table)

您的所有文件都有相同的项目吗？另外，请分享你所做的事情和你的困境？在你的问题中提供更多信息！是的，所有文件都有相同的项目做一些更改@Jai希望你能得到它因为你没有共享任何代码，我会用我的方法首先合并所有文件，然后汇总它们我正在发布答案所有文件都有相同的项目吗？另外，请分享你所做的事情和你的困境？在你的问题中提供更多信息！是的，所有文件都有相同的项目做一些更改@Jai希望你能得到它因为你没有共享任何代码，我会先合并所有文件，然后汇总它们我正在发布答案我想用多线程来做这件事，可能吗@MohsenI想用多线程来做这件事，有没有可能@mohsen如果我所有的csv文件都有不同的格式，我想同时处理所有这些文件呢？在这种情况下，你必须手动选择要求和的列，然后格式化

[python 3.x]相关文章推荐

随机文章推荐