Python 如何在正确的位置（无导入）从csv拆分中提取值？_Python_Csv

Python 如何在正确的位置（无导入）从csv拆分中提取值？

python csv

Python 如何在正确的位置（无导入）从csv拆分中提取值？,python,csv,Python,Csv,如何在不使用任何外部导入（例如csv或pandas）的情况下读取csv文件并将其转换为列表列表？以下是我到目前为止制定的代码： m = [] for line in myfile: m.append(line.split(',')) 使用这个for循环工作得很好，但是如果在csv中我得到一个“，”在其中一个字段中，它会错误地打断那里的行例如，如果csv中有一行是： 12,"This is a single entry, even if there's a coma"

如何在不使用任何外部导入（例如csv或pandas）的情况下读取csv文件并将其转换为列表列表？以下是我到目前为止制定的代码：

m = []
for line in myfile:
    m.append(line.split(','))

使用这个for循环工作得很好，但是如果在csv中我得到一个“，”在其中一个字段中，它会错误地打断那里的行

例如，如果csv中有一行是：

12,"This is a single entry, even if there's a coma",0.23

列表的相对元素如下所示：

['12', '"This is a single entry', 'even if there is a coma"','0.23\n']

def split_row(row, quote_char='"', delim=','):
    in_quote = False
    fields = []
    field = []
    
    for c in row:
        if c == quote_char:
            in_quote = not in_quote
        elif c == delim:
            if in_quote:
                field.append(c)
            else:
                fields.append(''.join(field))
                field = []
        else:
            field.append(c)
            
    if field:
        fields.append(''.join(field))
            
    return fields
    
    
fields = split_row('''12,"This is a single entry, even if there's a coma",0.23''')
print(len(fields), fields)

虽然我想获得：

['12', '"This is a single entry, even if there is a coma"','0.23']

我会避免尝试使用正则表达式，但您需要一次处理一个字符的文本，以确定引号字符的位置。通常情况下，引号字符不包括在字段中

下面是一个快速示例方法：

['12', '"This is a single entry', 'even if there is a coma"','0.23\n']

def split_row(row, quote_char='"', delim=','):
    in_quote = False
    fields = []
    field = []
    
    for c in row:
        if c == quote_char:
            in_quote = not in_quote
        elif c == delim:
            if in_quote:
                field.append(c)
            else:
                fields.append(''.join(field))
                field = []
        else:
            field.append(c)
            
    if field:
        fields.append(''.join(field))
            
    return fields
    
    
fields = split_row('''12,"This is a single entry, even if there's a coma",0.23''')
print(len(fields), fields)

这将显示：

3 ['12', "This is a single entry, even if there's a coma", '0.23']

CSV库在这方面做得更好。此脚本不会处理测试字符串上方的任何特殊情况。

我的目标是：

line ='12, "This is a single entry, more bits in here ,even if there is a coma",0.23 , 12, "This is a single entry, even if there is a coma", 0.23\n'

line_split = line.replace('\n', '').split(',')

quote_loc = [idx for idx, l in enumerate(line_split) if '"' in l]
quote_loc.reverse()

assert len(quote_loc) % 2 == 0, "value was odd, should be even"

for m, n in zip(quote_loc[::2], quote_loc[1::2]): 
  line_split[n] = ','.join(line_split[n:m+1])
  del line_split[n+1:m+1]


print(line_split)

这就是为什么需要使用库，它知道如何解析引用字段和转义序列。不要试着自己去做，这太难了。使用正则表达式不是违反了你关于不导入的限制吗？@Robb1看看t-answer for regex。在生产过程中，任何东西都可以使用csv库。正则表达式非常复杂，所以对于那些在这方面非常糟糕的人来说，这是不可能的。我认为它不会处理转义引号。我认为你应该编辑你的问题，澄清需要处理的案例，以及是否允许/什么样的导入。