Python 重塑数据帧:每76个条目新增一行
我是Python和Pandas的新手,正在通过UCI处理心脏病数据集 每个人和303个人都有76个属性,因此我想以每个人作为一行和76列结束。我很难将其安排到数据框中,因为数据似乎是以9行的形式显示的 我已尝试使用空格或换行符作为分隔符将数据集导入pandas dataframe,但仍无法阻止数据在每8个值后分割:Python 重塑数据帧:每76个条目新增一行,python,pandas,Python,Pandas,我是Python和Pandas的新手,正在通过UCI处理心脏病数据集 每个人和303个人都有76个属性,因此我想以每个人作为一行和76列结束。我很难将其安排到数据框中,因为数据似乎是以9行的形式显示的 我已尝试使用空格或换行符作为分隔符将数据集导入pandas dataframe,但仍无法阻止数据在每8个值后分割: df = pd.read_table('https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease
df = pd.read_table('https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/hungarian.data', sep=' ')
df
结果如下表所示:
1254 0 40 1 1.1 0.1 0.2
-9.0 2 140.0 0.0 289 -9.0 -9.0 -9.0
0.0 -9 -9.0 0.0 12 16.0 84.0 0.0
0.0 0 0.0 0.0 150 18.0 -9.0 7.0
172.0 86 200.0 110.0 140 86.0 0.0 0.0
0.0 -9 26.0 20.0 -9 -9.0 -9.0 -9.0
如果您有任何关于如何拆分此值并在第76个值之后创建新行的建议,我将不胜感激。每76个值都是字符串“name”,表示一个人数据的结束。谢谢大家! 由于预处理数据比处理“错误构建”的DF更容易:
输出:
In [20]: df
Out[20]:
0 1 2 3 4 5 6 7 8 9 ... 66 67 68 69 70 71 72 73 74 75
0 1254 0 40 1 1 0 0 -9 2 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
1 1255 0 49 0 1 0 0 -9 3 160 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
2 1256 0 37 1 1 0 0 -9 2 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
3 1257 0 48 0 1 1 1 -9 4 138 ... 2 -9 1 1 1 1 1 -9.0 -9.0 name
4 1258 0 54 1 1 0 1 -9 3 150 ... 1 -9 1 1 1 1 1 -9.0 -9.0 name
5 1259 0 39 1 1 0 1 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
6 1260 0 45 0 0 1 0 -9 2 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
7 1261 0 54 1 1 0 0 -9 2 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
8 1262 0 37 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
9 1263 0 48 0 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
10 1264 0 37 0 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
11 1265 0 58 1 1 0 0 -9 2 136 ... -9 2 1 1 1 7 1 -9.0 -9.0 name
12 1266 0 39 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
13 1267 0 49 1 1 1 1 -9 4 140 ... 2 -9 1 1 1 1 1 -9.0 -9.0 name
14 1268 0 42 0 1 0 1 -9 3 115 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
15 1269 0 54 0 1 1 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
16 1270 0 38 1 1 1 1 -9 4 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
17 1271 0 43 0 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
18 1272 0 60 1 1 1 1 -9 4 100 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
19 1273 0 36 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
20 1274 0 43 0 0 0 0 -9 1 100 ... -9 -9 1 1 1 1 2 -9.0 -9.0 name
21 1275 0 44 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
22 1276 0 49 0 1 0 0 -9 2 124 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
23 1277 0 44 1 1 0 0 -9 2 150 ... 2 -9 1 1 1 1 1 67.0 -9.0 name
24 1278 0 40 1 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
.. ... .. .. .. .. .. .. .. .. ... ... .. .. .. .. .. .. .. ... ... ...
269 1032 0 54 1 1 1 0 -9 4 130 ... -9 2 1 1 1 7 1 66.0 -9.0 name
270 1033 0 47 0 1 0 0 -9 3 130 ... -9 -9 1 1 1 1 1 68.0 -9.0 name
271 1034 0 45 1 1 1 1 -9 4 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
272 1035 0 32 0 1 0 0 -9 2 105 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
273 1036 0 55 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
274 1037 0 55 1 1 0 0 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
275 1038 0 45 0 0 0 0 -9 2 180 ... -9 -9 1 1 1 1 1 70.0 -9.0 name
276 1039 0 59 1 1 0 1 -9 3 180 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
277 1041 0 51 1 1 0 0 -9 3 135 ... 2 -9 1 1 3 8 2 -9.0 -9.0 name
278 1042 0 52 1 1 1 1 -9 4 170 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
279 1043 0 57 0 1 1 1 -9 4 180 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
280 1044 0 54 0 1 0 0 -9 2 130 ... -9 -9 1 1 1 1 3 -9.0 -9.0 name
281 1045 0 60 1 1 0 0 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
282 1046 0 49 1 1 1 1 -9 4 150 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
283 1047 0 51 0 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 61.0 -9.0 name
284 1048 0 55 0 0 0 0 -9 2 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
285 1049 0 42 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
286 1050 0 51 0 1 0 1 -9 3 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
287 1051 0 59 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
288 1052 0 53 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
289 1053 0 48 0 0 0 0 -9 2 -9 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
290 1054 0 36 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
291 5001 0 48 1 0 0 0 -9 3 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
292 5000 0 47 0 0 0 0 -9 2 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
293 5002 0 53 1 1 1 1 -9 4 130 ... 1 1 1 1 1 1 1 -9.0 -9.0 name
[294 rows x 76 columns]
这是可行但痛苦的数据帧破坏。由于输入文件没有那么大,我将处理输入字符串并替换\n和name,以获得对齐的行,从而为read\u表服务
In [20]: df
Out[20]:
0 1 2 3 4 5 6 7 8 9 ... 66 67 68 69 70 71 72 73 74 75
0 1254 0 40 1 1 0 0 -9 2 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
1 1255 0 49 0 1 0 0 -9 3 160 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
2 1256 0 37 1 1 0 0 -9 2 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
3 1257 0 48 0 1 1 1 -9 4 138 ... 2 -9 1 1 1 1 1 -9.0 -9.0 name
4 1258 0 54 1 1 0 1 -9 3 150 ... 1 -9 1 1 1 1 1 -9.0 -9.0 name
5 1259 0 39 1 1 0 1 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
6 1260 0 45 0 0 1 0 -9 2 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
7 1261 0 54 1 1 0 0 -9 2 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
8 1262 0 37 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
9 1263 0 48 0 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
10 1264 0 37 0 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
11 1265 0 58 1 1 0 0 -9 2 136 ... -9 2 1 1 1 7 1 -9.0 -9.0 name
12 1266 0 39 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
13 1267 0 49 1 1 1 1 -9 4 140 ... 2 -9 1 1 1 1 1 -9.0 -9.0 name
14 1268 0 42 0 1 0 1 -9 3 115 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
15 1269 0 54 0 1 1 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
16 1270 0 38 1 1 1 1 -9 4 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
17 1271 0 43 0 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
18 1272 0 60 1 1 1 1 -9 4 100 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
19 1273 0 36 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
20 1274 0 43 0 0 0 0 -9 1 100 ... -9 -9 1 1 1 1 2 -9.0 -9.0 name
21 1275 0 44 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
22 1276 0 49 0 1 0 0 -9 2 124 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
23 1277 0 44 1 1 0 0 -9 2 150 ... 2 -9 1 1 1 1 1 67.0 -9.0 name
24 1278 0 40 1 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
.. ... .. .. .. .. .. .. .. .. ... ... .. .. .. .. .. .. .. ... ... ...
269 1032 0 54 1 1 1 0 -9 4 130 ... -9 2 1 1 1 7 1 66.0 -9.0 name
270 1033 0 47 0 1 0 0 -9 3 130 ... -9 -9 1 1 1 1 1 68.0 -9.0 name
271 1034 0 45 1 1 1 1 -9 4 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
272 1035 0 32 0 1 0 0 -9 2 105 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
273 1036 0 55 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
274 1037 0 55 1 1 0 0 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
275 1038 0 45 0 0 0 0 -9 2 180 ... -9 -9 1 1 1 1 1 70.0 -9.0 name
276 1039 0 59 1 1 0 1 -9 3 180 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
277 1041 0 51 1 1 0 0 -9 3 135 ... 2 -9 1 1 3 8 2 -9.0 -9.0 name
278 1042 0 52 1 1 1 1 -9 4 170 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
279 1043 0 57 0 1 1 1 -9 4 180 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
280 1044 0 54 0 1 0 0 -9 2 130 ... -9 -9 1 1 1 1 3 -9.0 -9.0 name
281 1045 0 60 1 1 0 0 -9 3 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
282 1046 0 49 1 1 1 1 -9 4 150 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
283 1047 0 51 0 1 0 1 -9 3 130 ... -9 -9 1 1 1 1 1 61.0 -9.0 name
284 1048 0 55 0 0 0 0 -9 2 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
285 1049 0 42 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
286 1050 0 51 0 1 0 1 -9 3 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
287 1051 0 59 1 1 1 1 -9 4 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
288 1052 0 53 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
289 1053 0 48 0 0 0 0 -9 2 -9 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
290 1054 0 36 1 1 0 0 -9 2 120 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
291 5001 0 48 1 0 0 0 -9 3 110 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
292 5000 0 47 0 0 0 0 -9 2 140 ... -9 -9 1 1 1 1 1 -9.0 -9.0 name
293 5002 0 53 1 1 1 1 -9 4 130 ... 1 1 1 1 1 1 1 -9.0 -9.0 name
[294 rows x 76 columns]