csv文件上的Python正则表达式_Python_Regex_File_Csv

csv文件上的Python正则表达式

python regex file csv

csv文件上的Python正则表达式,python,regex,file,csv,Python,Regex,File,Csv,我有一个问题，那就是如何思考这个问题的最佳解决方案。我的CSV文件看起来像： ,02/12/2013,03/12/2013,04/12/2013,05/12/2013,06/12/2013,07/12/2013,08/12/2013, 06:00,"06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:0

我有一个问题，那就是如何思考这个问题的最佳解决方案。我的CSV文件看起来像：

,02/12/2013,03/12/2013,04/12/2013,05/12/2013,06/12/2013,07/12/2013,08/12/2013,
06:00,"06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport",06:00
,,,,,,,,
06:15,,,,,,,,06:15
,,,,,,,,
06:30,"06:30 Inside Africa: November 29, 2013","06:30 African Voices: Agatha Achindu","06:30 Inside the Louvre","06:30 Talk Asia: Franz Harary","06:30 Blueprint","06:30 Inside the Middle East","06:30 CNNGo",06:30

好的，我需要做的是，编译一张表中从1到多少的日期，并在开始前的每一行中，逗号前加上日期，如下图所示：

02/12/2013, "06:00 World Sport", 03/12/2013 "06:00 World Sport", 04/12/2013 "06:00 World of Sport"...
02/12/2013, "06:30 Inside Africa: November 23,2013", 03/12/2013, "06:30 African Voices.."

我的起始代码是这样的：

尝试：

你对这个问题有没有更好的想法

也许这样：

for line in fileinput.input(fnames):

                if re.search(r'\d{2}/\d{2}/\d{4}.*',line):
                    line_date = re.findall(r'\d{2}/\d{2}/\d{4}.*',line)[0]
                    line_split = re.split(r'\,',line_date)
                    for line1 in line_split:
                        var = line1
                        output.write(var+'\n')

                if re.search(r'\".+?\".*',line):
                    line_sadrzaj = re.findall(r'\".+?\".*',line)[0]
                    line_split1  = re.split  (r'\,',line_sadrzaj)
                    for line2 in line_split1:
                        var2 = line2
                        output.write(var2+'\n')
                    #output.write(line_sadrzaj+'\n'

你根本不需要正则表达式；只需使用

csv

模块读取csv文件，然后将结果转换为所需的输出

示例：

import csv
with open('csv.csv') as text:
    table = list(csv.reader(text))

# get all dates (skipping first and last column)
dates = table[0][1:-1]

# get all shows (skipping first and last column and empty rows)
shows =  filter(''.join, (t[1:-1] for t in table[1:]))

# join dates and shows back together and do some formatting
for line in [zip(dates, s) for s in shows]:
    print ', '.join('{}, "{}"'.format(*t) for t in line)

02/12/2013, "06:00 World Sport", 03/12/2013, "06:00 World Sport", 04/12/2013, "06:00 World Sport", 05/12/2013, "06:00 World Sport", 06/12/2013, "06:00 World Sport", 07/12/2013, "06:00 World Sport", 08/12/2013, "06:00 World Sport"
02/12/2013, "06:30 Inside Africa: November 29, 2013", 03/12/2013, "06:30 African Voices: Agatha Achindu", 04/12/2013, "06:30 Inside the Louvre", 05/12/2013, "06:30 Talk Asia: Franz Harary", 06/12/2013, "06:30 Blueprint", 07/12/2013, "06:30 Inside the Middle East", 08/12/2013, "06:30 CNNGo"

结果：

import csv
with open('csv.csv') as text:
    table = list(csv.reader(text))

# get all dates (skipping first and last column)
dates = table[0][1:-1]

# get all shows (skipping first and last column and empty rows)
shows =  filter(''.join, (t[1:-1] for t in table[1:]))

# join dates and shows back together and do some formatting
for line in [zip(dates, s) for s in shows]:
    print ', '.join('{}, "{}"'.format(*t) for t in line)

02/12/2013, "06:00 World Sport", 03/12/2013, "06:00 World Sport", 04/12/2013, "06:00 World Sport", 05/12/2013, "06:00 World Sport", 06/12/2013, "06:00 World Sport", 07/12/2013, "06:00 World Sport", 08/12/2013, "06:00 World Sport"
02/12/2013, "06:30 Inside Africa: November 29, 2013", 03/12/2013, "06:30 African Voices: Agatha Achindu", 04/12/2013, "06:30 Inside the Louvre", 05/12/2013, "06:30 Talk Asia: Franz Harary", 06/12/2013, "06:30 Blueprint", 07/12/2013, "06:30 Inside the Middle East", 08/12/2013, "06:30 CNNGo"

“按昏迷分割”怎么样？是的，我现在编写了代码，我将更新我的帖子如何将第一个日期和第一个事件合并到一行中，这是一个问题。：）