Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 替换多余的空格以格式化csv_Python_Parsing_Text - Fatal编程技术网

Python 替换多余的空格以格式化csv

Python 替换多余的空格以格式化csv,python,parsing,text,Python,Parsing,Text,我在一个.txt文件中有大量数据,格式如下 WOODY, Harlan Fred S2c USN WOOD, Earl A. PVT USAR WOOD, Frank S2c USN WOOD, Harold Baker BM2c USN WOOD, Horace Van

我在一个.txt文件中有大量数据,格式如下

WOODY, Harlan Fred                 S2c        USN
WOOD, Earl A.                      PVT        USAR
WOOD, Frank                        S2c        USN
WOOD, Harold Baker                 BM2c       USN
WOOD, Horace Van                   S1c        USN
WOOD, Roy Eugene                   F1c        USN
WOOLF, Norman Bragg                CWTP       USN
WORKMAN, Creighton Hale            F1c        USN
WOODY,Harlan Fred,S2c,USN
我想把它转换成csv格式,就像这样

WOODY, Harlan Fred                 S2c        USN
WOOD, Earl A.                      PVT        USAR
WOOD, Frank                        S2c        USN
WOOD, Harold Baker                 BM2c       USN
WOOD, Horace Van                   S1c        USN
WOOD, Roy Eugene                   F1c        USN
WOOLF, Norman Bragg                CWTP       USN
WORKMAN, Creighton Hale            F1c        USN
WOODY,Harlan Fred,S2c,USN

在Python中,我可以使用regex和/或split,但我需要保留名字和姓氏之间的空格。正如您所看到的,大多数条目之间的空格计数不同,偶尔也会出现选项卡(我想)。

这里有一种方法可以做到这一点——首先在逗号上拆分,然后在多个空格上拆分,以避免拆分单间隔的名称。然后用逗号连接所有项目,并使用
str.join

with open(textfile) as f, open(csvfile, 'w') as fc:
    for line in f:
        first, others = line.split(',')
        row = [first] + [i.strip() for i in others.split('   ') if i]
        fc.write(','.join(row) + '\n')
输出:

['WOODY', 'Harlan Fred', 'S2c', 'USN']

使用带有正则表达式分隔符的
pandas
读取\u csv
。Pandas将比用纯python编写的解决方案更快

import pandas as pd
pd.read_csv('./s.dat',header=None, delimiter=r"\s+") 
          0          1       2     3     4
0    WOODY,     Harlan    Fred   S2c   USN
1     WOOD,       Earl      A.   PVT  USAR
2     WOOD,      Frank     S2c   USN   NaN
3     WOOD,     Harold   Baker  BM2c   USN
4     WOOD,     Horace     Van   S1c   USN
5     WOOD,        Roy  Eugene   F1c   USN
6    WOOLF,     Norman   Bragg  CWTP   USN
7  WORKMAN,  Creighton    Hale   F1c   USN

另一种方法是用逗号替换每个双空格,然后在逗号上拆分,然后剥离非空值,最后用逗号连接。将以下内容应用于文本文件中的每一行:

','.join([x.strip() for x in line.replace('  ',',').split(',') if x])

这些实际服务人员姓名是否可能重复?如果是真实姓名,您可能不应该发布它们。@PseudoAj请注意,该页面中的解决方案将从
Harlan Fred
中删除空白。