Python 基于pandas中的数据帧格式化输出文本文件_Python_Pandas_Dataframe_Text_Output

Python 基于pandas中的数据帧格式化输出文本文件

python pandas dataframe text

Python 基于pandas中的数据帧格式化输出文本文件,python,pandas,dataframe,text,output,Python,Pandas,Dataframe,Text,Output,我有一个数据框架，它与一家商店及其顾客的购买行为有关我想以某种格式输出数据帧中的数据。数据框由以下列组成：客户ID，产品，产品列表，产品类别数据框中某些条目的示例如下： df = [{Customer ID: 00001, 00002, 00003}, {# of products: 3, 2, 5}, {List of Products: (Milk, Cheese, Bread), (Butter, Steak), (Bread, Apple, Steak, Pasta, Bana

我有一个数据框架，它与一家商店及其顾客的购买行为有关

我想以某种格式输出数据帧中的数据。数据框由以下列组成：

客户ID

，

产品

，

产品列表

，

产品类别

数据框中某些条目的示例如下：

df = [{Customer ID: 00001, 00002, 00003}, 
{# of products: 3, 2, 5},
{List of Products: (Milk, Cheese, Bread), (Butter, Steak), (Bread, Apple, Steak, Pasta, Bananas)}, 
{Class of Product: {[1,2,'D'], [3,3,'G']}, {[1,1,'D'], [2,2,'M']}, {[1,1,'G'], [2,2,'F'],[3,3,'M'], [4,4,'G'], [5,5,'F']}

我希望文本文件按如下方式输出：

00001 # Customer ID
3 # Number of Products
Milk Cheese Bread # List of Products separated using single spacing
D D G # Class corresponding to the products, where D = dairy, G = Gluten, also separated using single spacing
# New line

00002 # Next customer number (Next row of data frame)
2 # number of products
Butter Steak # List of products they purchased separated using single spacing
D M # Class corresponding to the products, where D = Dairy and M = Meat, also separated using single spacing
# New Line

00003 # Next customer number (Next row of data frame)
5 # number of products
Bread Apple Steak Pasta Bananas # List of products separated using single spacing, 
G F M G F # Corresponding to the products where F = Fruit, also separated using single spacing
# New Line

对于整个数据帧，依此类推

我不确定如何指定文本文件的特定格式，也不确定如何确保为每个产品正确打印产品的类别。例如，对于客户00001： [1,2，'D']，[3,3，'G']，确保类以正确的顺序以单间距打印为dg

更新：


    Customer_ID Num_Items   List_of_Products          Classes   
    00001        3         Milk Cheese Bread        [[1,2,'D'],[3,3,'G']]   
    00002        2         Butter Steak         [[1,1,'D'],[2,2,'M']]   
    00003        5         Bread Apple Steak Pasta Bananas  [[1,1,'G'], [2,2,'F'], [3,3, 'M'], [4,4,'G'], [5,5,'F']

您能否提供一个有效的

df

定义？事实上，它会引发类型错误，并且在不确切知道每列中的数据类型的情况下，它很难帮助您

假设此块创建的数据帧格式与您的相同：

df=pd.DataFrame（{'Customer ID'：['00001'，'00002'，'00003']，
#产品:[3,2,5],，
‘产品清单’：[‘牛奶’、‘奶酪’、‘面包’、[‘黄油’、‘牛排’、[‘面包’、‘苹果’、‘牛排’、‘意大利面’、‘香蕉’]，
‘产品类别’：[[1,2，'D']，[3,3，'G']，[1,1，'D']，[2,2，'M']，[[1,1，'G']，[2,2，'F']，[3,3，'M']，[4,4，'G']，[5,5，'F']]
})

那么下面的代码应该可以做到这一点：

file=open（'outputfile.txt'，'a'）
对于idx，df.iterrows（）中的行：
block=str（行['Customer ID']）+'\n'
block+=str（第['#行产品]）+'\n'
对于第[‘产品列表’行]中的产品：
块+=str（产品）+”
块+='\n'
电流=1
对于第[‘产品类别’]行中的classP：
如果len（classP）==3且classP[0]==当前值：
block+=（1+classP[1]-classP[0]）*（str（classP[2]）+“”）
电流=classP[1]+1
其他：
打印（“产品类别应该是一个包含两个数字和一个字母的列表，但我得到：“+str（classP））
块+='\n\n'
打印（块）
文件写入（块）

当然，由于您没有提供生成df的代码块，因此我无法确定这是否适用于您。

假设您的类行是一个列表列表列表

df = pd.DataFrame({'Customer ID': ['00001', '00002', '00003'], 
      '# of products': [3, 2, 5],
      'List of Products': ['Milk Cheese Bread', 'Butter Steak','Bread Apple Steak Pasta Bananas'], 
      'Class of Product': [[[1,2,'D'], [3,3,'G']], [[1,1,'D'], [2,2,'M']], [[1,1,'G'], [2,2,'F'],[3,3,'M'], [4,4,'G'], [5,5,'F']]]
     })

>>>df

  Customer ID  # of products                 List of Products                                   Class of Product
0       00001              3                Milk Cheese Bread                             [[1, 2, D], [3, 3, G]]
1       00002              2                     Butter Steak                             [[1, 1, D], [2, 2, M]]
2       00003              5  Bread Apple Steak Pasta Bananas  [[1, 1, G], [2, 2, F], [3, 3, M], [4, 4, G], [...

现在使用遍历每一行并保存到文本文件

with open('/path_to_file/file_nmae.txt','a')as fp:
    for _, row in df.iterrows():
        for i,value in enumerate(row):
            if i==3:
                extract =''
                for item in value:
                    if item:
                        extract+= ((item[1]-item[0]+1) * item[2])
                value = ' '.join(extract)
            else:
                if not isinstance(value, str):
                    value = str(value)

            fp.write(value+'\n')

输出

00001
3
Milk Cheese Bread
D D G
00002
2
Butter Steak
D M
00003
5
Bread Apple Steak Pasta Bananas
G F M G F

我仍然不清楚产品列表的数据类型是什么（列表还是元组？），在任何情况下，您是否尝试过简单地在每行上创建循环，然后编写一个短函数以您想要的方式格式化每列数据？产品列表是一个列表，在这段代码中是“classP”呢？我收到一个错误，说它没有定义对不起，我在复制代码时出错了。我刚刚编辑了我的答案。当我尝试这个时，它只是进入else语句并打印每行字符串的每个字符？例如，“产品类别应该是一个包含两个数字和一个字母的列表，但我得到了：[“\n”产品类别应该是一个包含两个数字和一个字母的列表，但我得到了：1”，等等。当我这样做时，我在

extract+=（（项[1]-项[0]+1）*项[2]）行收到一个错误字符串索引超出范围