Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫切片和vstack交错数据帧_Python_Pandas_Dataframe - Fatal编程技术网

Python 熊猫切片和vstack交错数据帧

Python 熊猫切片和vstack交错数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,以下代码读取csv文件,其中数据的格式为[a B C D E F G H D E F G H…] 并将其转换为按相同顺序堆叠的[A B C D E F G H] 这是数据源 http://web.mta.info/developers/data/nyct/turnstile/turnstile_110507.txt 下面是一个单行的示例结果 input_line = """A002,R051,02-00-00,05-21-11,00:00:00,REGULAR,003169391,001097

以下代码读取csv文件,其中数据的格式为[a B C D E F G H D E F G H…] 并将其转换为按相同顺序堆叠的[A B C D E F G H]

这是数据源

http://web.mta.info/developers/data/nyct/turnstile/turnstile_110507.txt
下面是一个单行的示例结果

input_line = """A002,R051,02-00-00,05-21-11,00:00:00,REGULAR,003169391,001097585,05-21-11,04:00:00,REGULAR,003169415,001097588,05-21-11,08:00:00,REGULAR,003169431,001097607,05-21-11,12:00:00,REGULAR,003169506,001097686,05-21-11,16:00:00,REGULAR,003169693,001097734,05-21-11,20:00:00,REGULAR,003169998,001097769,05-22-11,00:00:00,REGULAR,003170119,001097792,05-22-11,04:00:00,REGULAR,003170146,001097801"""

output_lines = """
A002,R051,02-00-00,05-21-11,00:00:00,REGULAR,003169391,001097585
A002,R051,02-00-00,05-21-11,04:00:00,REGULAR,003169415,001097588
A002,R051,02-00-00,05-21-11,08:00:00,REGULAR,003169431,001097607
A002,R051,02-00-00,05-21-11,12:00:00,REGULAR,003169506,001097686
A002,R051,02-00-00,05-21-11,16:00:00,REGULAR,003169693,001097734
A002,R051,02-00-00,05-21-11,20:00:00,REGULAR,003169998,001097769
A002,R051,02-00-00,05-22-11,00:00:00,REGULAR,003170119,001097792
A002,R051,02-00-00,05-22-11,04:00:00,REGULAR,003170146,001097801
"""




for name in filenames:
    with open(name, "rb") as f, open("updated_" + name, "wb") as fw:
        reader = csv.reader(f)
        writer = csv.writer(fw)
        for row in reader:
             header = row[0:3]
             readings = [row[x:x+5] for x in range(3, len(row), 5)]
             for elem in readings:
                 writer.writerow(header + elem)


熊猫和数据帧切片有没有办法做到这一点?

无法下载完整的数据集。仅限MTA内部使用吗

在一个文件中,第一列、第二列和第三列是否总是相同的?这是以下解决方案的假设:

如果每一行包含具有相同的第1列到第3列的条目,则需要进行一个小的修改:基本上是使用以下方法为每一行生成一个数据帧,然后将它们放在一起

如果一行包含多个ABCDEFGH,则需要一些更好的方法

In [68]:

df=input_line.split(',')
df_1stpt=df[:8]  #the leading row
df_2ndpt=np.array(df[8:]).reshape((-1,5)) #get the rest rows into the right shape
df_1stpt=pd.DataFrame(df_1stpt).T #create a dataframe containing the leading row
df_2ndpt=pd.DataFrame(df_2ndpt,columns=range(3,8)) #create a DF of the rest rows, with the right col idx
df_rst=df_1stpt.append(df_2ndpt, ignore_index=True) #put them together
df_rst.ix[:,[0,1,2]]=df_rst.ix[0,[0,1,2]].values #fill the nan's
In [69]:

print df_rst
      0     1         2         3         4        5          6          7
0  A002  R051  02-00-00  05-21-11  00:00:00  REGULAR  003169391  001097585
1  A002  R051  02-00-00  05-21-11  04:00:00  REGULAR  003169415  001097588
2  A002  R051  02-00-00  05-21-11  08:00:00  REGULAR  003169431  001097607
3  A002  R051  02-00-00  05-21-11  12:00:00  REGULAR  003169506  001097686
4  A002  R051  02-00-00  05-21-11  16:00:00  REGULAR  003169693  001097734
5  A002  R051  02-00-00  05-21-11  20:00:00  REGULAR  003169998  001097769
6  A002  R051  02-00-00  05-22-11  00:00:00  REGULAR  003170119  001097792
7  A002  R051  02-00-00  05-22-11  04:00:00  REGULAR  003170146  001097801

[8 rows x 8 columns]

你能举一个输入和输出的具体例子吗?因为现在我不知道代码如何将
[A B C D E F G H D E F G H D E F G H…]
转换为
[A B C D E F G H]
这是一个公开可用的数据集,只需在这里下载即可。