Pandas 每次遇到值时对数据帧进行切片_Pandas_Dataframe_Slice_Tidy

Pandas 每次遇到值时对数据帧进行切片

pandas dataframe

Pandas 每次遇到值时对数据帧进行切片,pandas,dataframe,slice,tidy,Pandas,Dataframe,Slice,Tidy,我有以下天气数据时间序列： 2016 Jan highavg low sum 1 27 21 14 0 2 27 20 14 0 3 26 20 14 0 4 26 21 15 0 5 26 21 17 0 6 26 21 17 0 7 26 20 14 0 8 27 20 14 0 9 25 22 19 0 10

我有以下天气数据时间序列：

   2016
   Jan  highavg low sum
    1   27  21  14  0
    2   27  20  14  0
    3   26  20  14  0
    4   26  21  15  0
    5   26  21  17  0
    6   26  21  17  0
    7   26  20  14  0
    8   27  20  14  0
    9   25  22  19  0
    10  22  19  17  0
    11  25  19  13  0
    12  24  19  13  0
    13  24  19  13  0
    14  25  19  14  0
    15  26  20  14  0
    16  26  20  14  0
    17  27  20  13  0
    18  26  19  13  0
    19  25  19  14  0
    20  23  20  17  3.05
    21  22  19  16  0
    22  20  17  14  0
    23  21  17  13  0
    24  22  17  11  0
    25  23  17  11  0
    26  22  16  10  0
    27  25  18  11  0
    28  18  17  14  0
    29  25  19  14  0
    30  24  19  13  0
    31  26  21  16  0
    2016 
    Feb high avg    low sum
    1   28  23  18  0

            import pandas as pd 


            #data collection -> raw data as displayed in your question
            data=pd.read_csv("data_slice.csv",header=None, )
            lines=data[0].values

            #list of new month positions
            positions=[i for i,line in enumerate(lines) if ("high" in line)]


            #final dataframe preparation
            final_df=pd.DataFrame()

            for index,pos in enumerate(positions):
                #year value in the line above
                year=lines[pos-1]
                #month value is the first substring, expected spaces
                month=list(filter(None, lines[pos].split(" ")))[0]

                #subdataframe collections
                try:
                    next_pos=positions[index+1]
                    sub_df=pd.DataFrame(lines[pos+1:next_pos-1], columns=["col"])             

                except:
                    sub_df=pd.DataFrame(lines[pos+1:], columns=["col"])

                #format column split in key measures
                sub_df['year']=year
                sub_df['month']=month
                sub_df['col']=sub_df['col'].str.replace("   "," ").str.replace("  "," ")
                col_df=pd.DataFrame(sub_df.col.str.split(" ",).tolist(), columns=["empty","day","hi","avr","low","sum"])

                temp = pd.concat([col_df['day'], sub_df['year'], sub_df['month'],col_df[["hi","avr","low","sum"]]], axis=1 )
                #final dataframe feed
                final_df=final_df.append(temp)
            print(final_df)

2016年1月1日至2018年1月1日

            import pandas as pd 


            #data collection -> raw data as displayed in your question
            data=pd.read_csv("data_slice.csv",header=None, )
            lines=data[0].values

            #list of new month positions
            positions=[i for i,line in enumerate(lines) if ("high" in line)]


            #final dataframe preparation
            final_df=pd.DataFrame()

            for index,pos in enumerate(positions):
                #year value in the line above
                year=lines[pos-1]
                #month value is the first substring, expected spaces
                month=list(filter(None, lines[pos].split(" ")))[0]

                #subdataframe collections
                try:
                    next_pos=positions[index+1]
                    sub_df=pd.DataFrame(lines[pos+1:next_pos-1], columns=["col"])             

                except:
                    sub_df=pd.DataFrame(lines[pos+1:], columns=["col"])

                #format column split in key measures
                sub_df['year']=year
                sub_df['month']=month
                sub_df['col']=sub_df['col'].str.replace("   "," ").str.replace("  "," ")
                col_df=pd.DataFrame(sub_df.col.str.split(" ",).tolist(), columns=["empty","day","hi","avr","low","sum"])

                temp = pd.concat([col_df['day'], sub_df['year'], sub_df['month'],col_df[["hi","avr","low","sum"]]], axis=1 )
                #final dataframe feed
                final_df=final_df.append(temp)
            print(final_df)

我希望能够从中创建一个整洁的timeseries数据集，我想象每次我计算年份（2016、2017、2018）时都会对数据帧进行切片，并创建不同的数据帧（每个数据帧对应于每个年份和月份组合），然后追加它们

            import pandas as pd 


            #data collection -> raw data as displayed in your question
            data=pd.read_csv("data_slice.csv",header=None, )
            lines=data[0].values

            #list of new month positions
            positions=[i for i,line in enumerate(lines) if ("high" in line)]


            #final dataframe preparation
            final_df=pd.DataFrame()

            for index,pos in enumerate(positions):
                #year value in the line above
                year=lines[pos-1]
                #month value is the first substring, expected spaces
                month=list(filter(None, lines[pos].split(" ")))[0]

                #subdataframe collections
                try:
                    next_pos=positions[index+1]
                    sub_df=pd.DataFrame(lines[pos+1:next_pos-1], columns=["col"])             

                except:
                    sub_df=pd.DataFrame(lines[pos+1:], columns=["col"])

                #format column split in key measures
                sub_df['year']=year
                sub_df['month']=month
                sub_df['col']=sub_df['col'].str.replace("   "," ").str.replace("  "," ")
                col_df=pd.DataFrame(sub_df.col.str.split(" ",).tolist(), columns=["empty","day","hi","avr","low","sum"])

                temp = pd.concat([col_df['day'], sub_df['year'], sub_df['month'],col_df[["hi","avr","low","sum"]]], axis=1 )
                #final dataframe feed
                final_df=final_df.append(temp)
            print(final_df)

我是Python新手，非常感谢您的指导，谢谢

            import pandas as pd 


            #data collection -> raw data as displayed in your question
            data=pd.read_csv("data_slice.csv",header=None, )
            lines=data[0].values

            #list of new month positions
            positions=[i for i,line in enumerate(lines) if ("high" in line)]


            #final dataframe preparation
            final_df=pd.DataFrame()

            for index,pos in enumerate(positions):
                #year value in the line above
                year=lines[pos-1]
                #month value is the first substring, expected spaces
                month=list(filter(None, lines[pos].split(" ")))[0]

                #subdataframe collections
                try:
                    next_pos=positions[index+1]
                    sub_df=pd.DataFrame(lines[pos+1:next_pos-1], columns=["col"])             

                except:
                    sub_df=pd.DataFrame(lines[pos+1:], columns=["col"])

                #format column split in key measures
                sub_df['year']=year
                sub_df['month']=month
                sub_df['col']=sub_df['col'].str.replace("   "," ").str.replace("  "," ")
                col_df=pd.DataFrame(sub_df.col.str.split(" ",).tolist(), columns=["empty","day","hi","avr","low","sum"])

                temp = pd.concat([col_df['day'], sub_df['year'], sub_df['month'],col_df[["hi","avr","low","sum"]]], axis=1 )
                #final dataframe feed
                final_df=final_df.append(temp)
            print(final_df)

编辑：数据以CSV格式输入。此代码适用于您的问题。这段时间很长，但我认为这会帮助你更多地练习python和pandas

            import pandas as pd 


            #data collection -> raw data as displayed in your question
            data=pd.read_csv("data_slice.csv",header=None, )
            lines=data[0].values

            #list of new month positions
            positions=[i for i,line in enumerate(lines) if ("high" in line)]


            #final dataframe preparation
            final_df=pd.DataFrame()

            for index,pos in enumerate(positions):
                #year value in the line above
                year=lines[pos-1]
                #month value is the first substring, expected spaces
                month=list(filter(None, lines[pos].split(" ")))[0]

                #subdataframe collections
                try:
                    next_pos=positions[index+1]
                    sub_df=pd.DataFrame(lines[pos+1:next_pos-1], columns=["col"])             

                except:
                    sub_df=pd.DataFrame(lines[pos+1:], columns=["col"])

                #format column split in key measures
                sub_df['year']=year
                sub_df['month']=month
                sub_df['col']=sub_df['col'].str.replace("   "," ").str.replace("  "," ")
                col_df=pd.DataFrame(sub_df.col.str.split(" ",).tolist(), columns=["empty","day","hi","avr","low","sum"])

                temp = pd.concat([col_df['day'], sub_df['year'], sub_df['month'],col_df[["hi","avr","low","sum"]]], axis=1 )
                #final dataframe feed
                final_df=final_df.append(temp)
            print(final_df)

您好如果您的数据是什么格式的输入？Csv？Json？数据是csv格式的