如何理解这段代码,在Python中拆分数组?
我对Python中的一行有点困惑: 我们使用Python和一个自定义函数来分割一行:我们希望引号之间的内容是数组中的一个条目 例如,该行是:如何理解这段代码,在Python中拆分数组?,python,string,Python,String,我对Python中的一行有点困惑: 我们使用Python和一个自定义函数来分割一行:我们希望引号之间的内容是数组中的一个条目 例如,该行是: "La Jolla Bank, FSB",La Jolla,CA,32423,19-Feb-10,24-Feb-10 因此,“La Jolla Bank,FSB”应该是数组中的单个条目 我不确定是否理解此代码: 第一个字符是引号“””,因此变量“quote”设置为其倒数,因此设置为“TRUE” 然后我们检查逗号,如果quote设置为其倒数,那么quote
"La Jolla Bank, FSB",La Jolla,CA,32423,19-Feb-10,24-Feb-10
因此,“La Jolla Bank,FSB”应该是数组中的单个条目
我不确定是否理解此代码:
”
”,因此变量“quote”设置为其倒数,因此设置为“TRUE”current=“”
剪切它,这就是我不明白的地方:我们仍然在引号之间,所以通常我们现在不应该剪切它!编辑:so and not quote表示“假”,而不是“相反的”,谢谢这就是将
current
初始化为空字符串,清除之前可能设置为的内容
只要你不在引号内(即,quote
为False),当你看到一个,
,你就到达了字段的末尾。你在current
中积累的内容就是该字段的内容,所以将其附加到retval
并将current
重置到空字符串中,为下一个字段做好准备
这就是说,这看起来像是在处理一个.csv输入。有一个可以为您处理这个问题的程序。您查看它时,就好像运行了
if char='“
和elif char=',”而不是quote
一样
但是,if语句显式地使它只运行一个
quote将被反转,或者当前值将被剪切
如果当前字符为“
,则将调用逻辑以反转quote
标志。但剪切字符串的逻辑将不会运行
在当前字符为,
的情况下,反转标志的逻辑将不会运行,但如果未设置引号
标志,则剪切字符串的逻辑将运行。当前被重置为空,因为在遇到“,”且您不在“,”引号下的情况下,您应将其解释为代币”
这绝对不是pythonic,for char in string
让我感到害怕,编写这段代码的人应该使用regex。您看到的是大多数语言解析程序使用的压缩版
让我们看看我是否能对其进行注释:
def mysplit (string):
# We start out at the beginning of the string NOT in between quotes
quote = False
# Hold each element that we split out
retval = []
# This variable holds whatever the current item we're interested in is
# e.g: If we're in a quote, then it's everything (including commas)
# otherwise it's every UP UNTIL the next comma
current = ""
# Scan the string character by character
for char in string:
# We hit a quote, so turn on QUOTE SCANNING MODE!!!
# If we're in quote scanning mode, turn it off
if char == '"':
quote = not quote
# We hit a comma, and we're not in quote scanning mode
elif char == ',' and not quote:
# We got what we want, let's put it in the return value
# and then reset our current item to nothing so we can prepare for the next item.
retval.append(current)
current = ""
else:
# Nothing special, let's just keep building up our current item
current += char
# We're done with all the characters, let's put together whatever we were working on when we ran out of characters
retval.append(current)
# Return it!
return retval
这不是拆分的最佳代码,但它非常简单
1 current = ""
# First you set current to empty string, the following line
# will loop through the string to be split and pull characters out of it
# one by one... setting 'char' to be the value of next character
2 for char in string:
# the following code will check if the line we are currently inside of the quote
# if otherwise it will add the current character to the the 'current' variable
#
3 if char == '"':
4 quote = not quote
5 elif char == ',' and not quote:
6 retval.append(current)
### if we see the comma, it will append whatever is accumulated in current to the
### return result.
### then you have to reset the value in the current to let the next word accumulate
7 current = "" #why do we cut current here?
8 else:
9 current += char
### after the last char is seen, we still have left over characters in current which
### we can just shove into the final result
10 retval.append(current)
11 return retval
Here is an example run:
Let string be 'a,bbb,ccc
Step char current retval
1 a a {}
2 , {a} ### Current is reset
3 b b {a}
4 b bb {a}
5 b bbb {a}
6 , {a,bbb} ### Current is reset
and so on
好吧,你不太明白
1.第一个字符是引号
“”,因此变量“quote”设置为其倒数,因此设置为
“真的”
好!!因此,quote被设置为与之前相反的值。在程序开始时,它为false,因此当看到“
时,它变为true。反之亦然,如果它为true,并且看到引号,它变为false
换句话说,程序的这一行从该行之前的任何内容更改为
quote,称为“切换”
不引用
的意思是“只有在引用为false时才引用”。这与它是否“设置为其逆”无关。没有变量可以等于它自己的逆!这就像说X=True和X=false
-显然是胡说八道
quote
始终要么是True
要么是False
——而不是别的
3.我们用current=“”,这就是我不明白的地方:我们仍然在引号之间,所以我们现在不应该剪切它
因此,希望您现在可以看到,如果您到达这一行,您就不在引号之间。
not quote
确保您不会在引号内剪切,因为not quote
真正的意思是-not在引号中!对于记录,它正是这样做的。在实际代码中,我强烈建议您使用它代替自定义函数。谢谢Bryan,你说的正是我所期望的,这让我很困惑,我们打了一个逗号,我们没有处于引号扫描模式
:所以和not quote
意味着quote设置为false
,但是,我认为这是原始引号的倒数
所以我们正在进行引号扫描ode,因为quote=false的倒数是quote=true
!所以和not quote
并不意味着“的倒数”,而只是意味着“false”对吗?如果这是正确的,我会理解的;否则,我仍然会感到困惑!如果char=',“而not quote
意味着“只要quote是false”,而如果char='”“:quote=not quote
表示“只要有quote字符,请将值更改为相反的值”。因此,在字符串的第一个引号处,它被设置为true
。然后,每当它点击第二个引号时,它就会被设置回false。因为下一个字符是逗号,引号现在是false,然后它将跟踪的内容附加到返回列表中。是的,是的,这让我有点困惑,但我明白了,这就是我想读的主题:和not quote
不是关于,它的对立面是
,但真正的意思是:quote等于false
。谢谢你的快速回答!干杯谢谢John,我没有真正理解不引用在这里的意思:elif char==”,“和不引用:
,如果“不引用”等于false
,这对我来说是有意义的。但如何理解它,我认为逻辑是:“和引号的倒数”,因为quote
在开始时设置为false,所以倒数应该是TRUE
。。。这不是怎么回事,你知道我的意思吗
1 current = ""
# First you set current to empty string, the following line
# will loop through the string to be split and pull characters out of it
# one by one... setting 'char' to be the value of next character
2 for char in string:
# the following code will check if the line we are currently inside of the quote
# if otherwise it will add the current character to the the 'current' variable
#
3 if char == '"':
4 quote = not quote
5 elif char == ',' and not quote:
6 retval.append(current)
### if we see the comma, it will append whatever is accumulated in current to the
### return result.
### then you have to reset the value in the current to let the next word accumulate
7 current = "" #why do we cut current here?
8 else:
9 current += char
### after the last char is seen, we still have left over characters in current which
### we can just shove into the final result
10 retval.append(current)
11 return retval
Here is an example run:
Let string be 'a,bbb,ccc
Step char current retval
1 a a {}
2 , {a} ### Current is reset
3 b b {a}
4 b bb {a}
5 b bbb {a}
6 , {a,bbb} ### Current is reset
and so on