Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/amazon-web-services/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 读取IRS空格分隔的txt数据_Pandas_Txt - Fatal编程技术网

Pandas 读取IRS空格分隔的txt数据

Pandas 读取IRS空格分隔的txt数据,pandas,txt,Pandas,Txt,我最近在处理国税局的税务文件数据。它是以空格分隔的txt数据,如下所示(完整数据为): 数据的存储方式有一些模式。但对我来说,数据的格式不是标准的,也不容易读入。我想知道如何从上面的txt数据中获得如下数据帧: +------------+-------------+--------------------------+-----+-----+-----+------+ | fips_state | fips_county | name | c1 |

我最近在处理国税局的税务文件数据。它是以空格分隔的txt数据,如下所示(完整数据为):

数据的存储方式有一些模式。但对我来说,数据的格式不是标准的,也不容易读入。我想知道如何从上面的txt数据中获得如下数据帧:

+------------+-------------+--------------------------+-----+-----+-----+------+
| fips_state | fips_county |           name           | c1  | c2  | c3  |  c4  |
+------------+-------------+--------------------------+-----+-----+-----+------+
|         02 |         013 | Aleutians East Borough T | 145 | 280 | 416 | 1002 |
|         02 |         016 | Aleutians West Total Mig | 304 | 535 | 991 | 2185 |
|        ... |         ... | ...                      | ... | ... | ... |  ... |
+------------+-------------+--------------------------+-----+-----+-----+------+

这将使您在pandas中或在创建列表之前将数据放入两个单独的数据框中的列中。解析后,合并两个数据帧

import urllib.request  # the lib that handles the url stuff

target_url='https://raw.githubusercontent.com/shuai-zhou/DataRepo/master/data/C9091aki.txt'
list_a = []
list_b = []
for line in urllib.request.urlopen(target_url):
    if line.decode('utf-8')[0:2] != '  ':
        print(line.decode('utf-8').strip())
        list_a.append(line.decode('utf-8').strip())
    if line.decode('utf-8')[0:5] == '     ':
        print(line.decode('utf-8').strip())
        list_b.append(line.decode('utf-8').strip())
 
dfa = pd.DataFrame(list_a)
dfb = pd.DataFrame(list_b)