Python 清理非统一短语列表
我有一张类似这样的清单Python 清理非统一短语列表,python,data-cleaning,Python,Data Cleaning,我有一张类似这样的清单 ["['brill building pop",'quiet storm','ballad','easy listening',"motown'"," 'disco",'soul jazz', 'smooth jazz','soul','jazz','soft rock',"uk garage'"," 'chill-out",'german pop','salsa','r&b', 'chanson','rock',"pop'"," 'blues-rock",'vo
["['brill building pop",'quiet storm','ballad','easy listening',"motown'"," 'disco",'soul jazz',
'smooth jazz','soul','jazz','soft rock',"uk garage'"," 'chill-out",'german pop','salsa','r&b',
'chanson','rock',"pop'"," 'blues-rock",'vocal jazz','funk','oldies','pop rock',"downtempo'",
" 'hip hop",'classic rock','united states','germany',"adult contemporary'"," 'folk rock",'vocal',
'soundtrack','blues','female vocalist',"electronic'"," 'new wave",'urban','reggae','singer-songwriter',
'swing','60s',"female'"," 'american",'80s','90s',"ambient']"]
应该是这样的:
['brill building pop','quiet storm','ballad','easy listening','motown','disco','soul jazz',
'smooth jazz','soul','jazz','soft rock','uk garage','chill-out','german pop','salsa','r&b',
'chanson','rock','pop','blues-rock','vocal jazz','funk','oldies','pop rock','downtempo',
'hip hop','classic rock','united states','germany','adult contemporary','folk rock','vocal',
'soundtrack','blues','female vocalist','electronic','new wave','urban','reggae','singer-songwriter',
'swing','60s','female','american','80s','90s','ambient']
如你所见,有杂散撇号、不完全方括号、空白等。这些元素是短语,所以当我不想在单词中间去掉空白时,如果它们在开始或结束时,我想删除它们。有没有一种简单的方法可以做到这一点?
按照这种结构,它已经是正确的列表,只是有很多额外的内容,因此您可以使用replace()
和strip()
,如下所示:
['brill building pop','quiet storm','ballad','easy listening','motown','disco','soul jazz',
'smooth jazz','soul','jazz','soft rock','uk garage','chill-out','german pop','salsa','r&b',
'chanson','rock','pop','blues-rock','vocal jazz','funk','oldies','pop rock','downtempo',
'hip hop','classic rock','united states','germany','adult contemporary','folk rock','vocal',
'soundtrack','blues','female vocalist','electronic','new wave','urban','reggae','singer-songwriter',
'swing','60s','female','american','80s','90s','ambient']
zmod=[zz.replace('\'','').replace('[','').replace(']','').strip()用于z中的zz]
兹莫德
当然有一种较短的正则表达式方法,但我认为这是最具可读性的方法。按照这种结构,它已经是正确的列表,只是有很多额外的内容,因此您可以使用
replace()
和strip()
,如下所示:
['brill building pop','quiet storm','ballad','easy listening','motown','disco','soul jazz',
'smooth jazz','soul','jazz','soft rock','uk garage','chill-out','german pop','salsa','r&b',
'chanson','rock','pop','blues-rock','vocal jazz','funk','oldies','pop rock','downtempo',
'hip hop','classic rock','united states','germany','adult contemporary','folk rock','vocal',
'soundtrack','blues','female vocalist','electronic','new wave','urban','reggae','singer-songwriter',
'swing','60s','female','american','80s','90s','ambient']
zmod=[zz.replace('\'','').replace('[','').replace(']','').strip()用于z中的zz]
兹莫德
当然有一种较短的正则表达式方法,但我觉得这是最可读的。有没有简单的方法?也许,到底是什么问题?请看,有没有简单的方法可以做到这一点?也许,到底是什么问题?请看。