Python 在csv文件的行上迭代时，动态地将计算列添加到dataframe？_Python_Pandas_Iterator

Python 在csv文件的行上迭代时，动态地将计算列添加到dataframe？

python pandas

Python 在csv文件的行上迭代时，动态地将计算列添加到dataframe？,python,pandas,iterator,Python,Pandas,Iterator,我有一个大空间分隔的输入文件input.csv，无法保存在内存中： ## Header # More header here A B 1 2 3 4 如果使用的iterator=True参数，那么它将返回TextFileReader/TextParser对象。这允许动态筛选文件，并仅选择列A大于2的行但是，如何在运行中向数据帧添加第三列，而不必再次循环所有数据具体地说，我希望columnC等于columnA乘以dictionaryd中的值，dictionary的键是column

我有一个大空间分隔的输入文件

input.csv

，无法保存在内存中：

## Header
# More header here
A   B
1   2
3   4

如果使用的

iterator=True

参数，那么它将返回

TextFileReader

TextParser

对象。这允许动态筛选文件，并仅选择列

大于2的行

但是，如何在运行中向数据帧添加第三列，而不必再次循环所有数据

具体地说，我希望column

等于column

乘以dictionary

中的值，dictionary的键是column

；i、 e.

C=A*d[B]

目前我有以下代码：

import pandas
d = {2: 2, 4: 3}
TextParser = pandas.read_csv('input.csv', sep=' ', iterator=True, comment='#')
df = pandas.concat([chunk[chunk['A'] > 2] for chunk in TextParser])
print(df)

将打印此输出：

   A  B
1  3  4

如何让它打印此输出（

C=A*d[B]

）：

您可以使用生成器一次处理一个块：

代码：

def on_the_fly(the_csv):
    d = {2: 2, 4: 3}
    chunked_csv = pd.read_csv(
        the_csv, sep='\s+', iterator=True, comment='#')

    for chunk in chunked_csv:
        rows_idx = chunk['A'] > 2
        chunk.loc[rows_idx, 'C'] = chunk[rows_idx].apply(
            lambda x: x.A * d[x.B], axis=1)
        yield chunk[rows_idx]

from io import StringIO
data = StringIO(u"""#
    A   B
    1   2
    3   4
    4   4
""")

import pandas as pd
df = pd.concat([c for c in on_the_fly(data)])
print(df)

   A  B     C
1  3  4   9.0
2  4  4  12.0

测试代码：

def on_the_fly(the_csv):
    d = {2: 2, 4: 3}
    chunked_csv = pd.read_csv(
        the_csv, sep='\s+', iterator=True, comment='#')

    for chunk in chunked_csv:
        rows_idx = chunk['A'] > 2
        chunk.loc[rows_idx, 'C'] = chunk[rows_idx].apply(
            lambda x: x.A * d[x.B], axis=1)
        yield chunk[rows_idx]

from io import StringIO
data = StringIO(u"""#
    A   B
    1   2
    3   4
    4   4
""")

import pandas as pd
df = pd.concat([c for c in on_the_fly(data)])
print(df)

   A  B     C
1  3  4   9.0
2  4  4  12.0

结果：

def on_the_fly(the_csv):
    d = {2: 2, 4: 3}
    chunked_csv = pd.read_csv(
        the_csv, sep='\s+', iterator=True, comment='#')

    for chunk in chunked_csv:
        rows_idx = chunk['A'] > 2
        chunk.loc[rows_idx, 'C'] = chunk[rows_idx].apply(
            lambda x: x.A * d[x.B], axis=1)
        yield chunk[rows_idx]

from io import StringIO
data = StringIO(u"""#
    A   B
    1   2
    3   4
    4   4
""")

import pandas as pd
df = pd.concat([c for c in on_the_fly(data)])
print(df)

   A  B     C
1  3  4   9.0
2  4  4  12.0