Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/282.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 删除停止字将清除不在停止字列表中的字_Python_Split - Fatal编程技术网

Python 删除停止字将清除不在停止字列表中的字

Python 删除停止字将清除不在停止字列表中的字,python,split,Python,Split,我对挖掘科学文献特别是PubMed感兴趣。我想确定我选择的关键字左右两侧的单词修饰词。我的计划是(1)在我的听力和助听器数据库中查询“AID”一词。(2) 然后,我从包含标题+摘要的字段中删除了标点符号、双空格等,由于历史原因,这些都是大写的。(3) 接下来,我在空格处拆分文本,(4)从MYSQL获得的列表中删除了stopwords。回想起来,该列表可能位于某个类的某个位置。(5) 我寻找关键词“援助”,并在前后收集了钥匙。代码来自StackOverflow和其他站点的许多源代码,因为我是pyt

我对挖掘科学文献特别是PubMed感兴趣。我想确定我选择的关键字左右两侧的单词修饰词。我的计划是(1)在我的听力和助听器数据库中查询“AID”一词。(2) 然后,我从包含标题+摘要的字段中删除了标点符号、双空格等,由于历史原因,这些都是大写的。(3) 接下来,我在空格处拆分文本,(4)从MYSQL获得的列表中删除了stopwords。回想起来,该列表可能位于某个类的某个位置。(5) 我寻找关键词“援助”,并在前后收集了钥匙。代码来自StackOverflow和其他站点的许多源代码,因为我是python和sqlite的新手。代码中的问题区域如下所示

my_stopwords = '''['A','ABLE','ABOUT','ABOVE','ACCORDING','ACCORDINGLY','ACROSS','ACTUALLY','AFTER','AFTERWARDS','AGAIN','AGAINST','ALL','ALLOW','ALLOWS','ALMOST','ALONE','ALONG','ALREADY','ALSO','ALTHOUGH','ALWAYS','AM','AMONG','AMONGST','AN','ANOTHER',
                        'ANY','ANYBODY','ANYHOW','ANYONE','ANYTHING','ANYWAY','ANYWAYS','ANYWHERE','APART','APPEAR','APPRECIATE','APPROPRIATE','ARE',
                        'AROUND','AS','ASIDE','ASK','ASKING','ASSOCIATED','AT','AVAILABLE','AWAY','AWFULLY','BE','BECAME','BECAUSE','BECOME','BECOMES',
                        'BECOMING','BEEN','BEFORE','BEFOREHAND','BEHIND','BEING','BELIEVE','BELOW','BESIDE','BESIDES','BEST','BETTER','BETWEEN','BEYOND',
                        'BOTH','BRIEF','BUT','BY','CAME','CAN','CANNOT','CANT','CAUSE','CAUSES','CERTAIN','CERTAINLY','CHANGES','CLEARLY','CO','COM','COME',
                        'COMES','CONCERNING','CONSEQUENTLY','CONSIDER','CONSIDERING','CONTAIN','CONTAINING','CONTAINS','CORRESPONDING','COULD','COURSE',
                        'CURRENTLY','DEFINITELY','DESCRIBED','DESPITE','DETERMINE','DETERMINED','DID','DIFFERENT','DO','DOES','DOING','DONE','DOWN','DOWNWARDS','DURING','EACH','EDU',
                        'EFFECT','EFFECTS','EG','EIGHT','EITHER','ELSE','ELSEWHERE','ENOUGH','ENTIRELY','ESPECIALLY','ET','ETC','EVEN','EVER','EVERY','EVERYBODY','EVERYONE',
                        'EVERYTHING','EVERYWHERE','EX','EXACTLY','EXAMPLE','EXCEPT','FAR','FEW','FIFTH','FIRST','FIVE','FOLLOWED','FOLLOWING','FOLLOWS',
                        'FOR','FORMER','FORMERLY','FORTH','FOUR','FROM','FURTHER','FURTHERMORE','GET','GETS','GETTING','GIVEN','GIVES','GO','GOES','GOING',
                        'GONE','GOT','GOTTEN','GREETINGS','HAD','HAPPENS','HARDLY','HAS','HAVE','HAVING','HE','HELLO','HELP','HENCE','HER','HERE','HEREAFTER',
                        'HEREBY','HEREIN','HEREUPON','HERS','HERSELF','HI','HIM','HIMSELF','HIS','HITHER','HOPEFULLY','HOW','HOWBEIT','HOWEVER','IE','IF',
                        'IGNORED','IMMEDIATE','IN','INASMUCH','INC','INDEED','INDICATE','INDICATED','INDICATES','INNER','INSOFAR','INSTEAD','INTO','INWARD',
                        'IS','IT','ITS','ITSELF','JUST','KEEP','KEEPS','KEPT','KNOW','KNOWN','KNOWS','LAST','LATELY','LATER','LATTER','LATTERLY','LEAST','LESS',
                        'LEST','LET','LIKED','LIKELY','LITTLE','LOOK','LOOKING','LOOKS','LTD','MAINLY','MANY','MAY','MAYBE','ME','MEAN','MEANWHILE','MERELY',
                        'MIGHT','MORE','MOREOVER','MOST','MOSTLY','MUCH','MUST','MY','MYSELF','NAME','NAMELY','ND','NEAR','NEARLY','NECESSARY','NEED','NEEDS',
                        'NEITHER','NEVER','NEVERTHELESS','NEW','NEXT','NINE','NO','NOBODY','NON','NONE','NOONE','NOR','NORMALLY','NOT','NOTHING','NOVEL','NOW',
                        'NOWHERE','OBVIOUSLY','OF','OFF','OFTEN','OH','OK','OKAY','OLD','ON','ONCE','ONE','ONES','ONLY','ONTO','OTHER','OTHERS','OTHERWISE',
                        'OUGHT','OUR','OURS','OURSELVES','OUT','OUTSIDE','OVER','OVERALL','OWN','PARTICULAR','PARTICULARLY','PER','PERHAPS','PLACED','PLEASE',
                        'PLUS','POSSIBLE','PRESUMABLY','PROBABLY','PROVIDES','QUE','QUITE','QV','RATHER','RD','RE','REALLY','REASONABLY','REGARDING',
                        'REGARDLESS','REGARDS','RELATIVELY','RESPECTIVELY','RIGHT','SAID','SAME','SAW','SAY','SAYING','SAYS','SECOND','SECONDLY','SEE','SEEING',
                        'SEEM','SEEMED','SEEMING','SEEMS','SEEN','SELF','SELVES','SENSIBLE','SENT','SERIOUS','SERIOUSLY','SEVEN','SEVERAL','SHALL','SHE','SHOULD',
                        'SHOWED','SHOWS','SINCE','SIGNIFICANTLY','SIX','SO','SOME','SOMEBODY','SOMEHOW','SOMEONE','SOMETHING','SOMETIME','SOMETIMES','SOMEWHAT','SOMEWHERE','SOON','SORRY',
                        'SPECIFIED','SPECIFY','SPECIFYING','STILL','STUDY','SUB','SUCH','SUP','SURE','TAKE','TAKEN','TELL','TENDS','TH','THAN','THANK','THANKS',
                        'THANX','THAT','THATS','THE','THEIR','THEIRS','THEM','THEMSELVES','THEN','THENCE','THERE','THEREAFTER','THEREBY','THEREFORE',
                        'THEREIN','THERES','THEREUPON','THESE','THEY','THINK','THIRD','THIS','THOROUGH','THOROUGHLY','THOSE','THOUGH','THREE','THROUGH',
                        'THROUGHOUT','THRU','THUS','TO','TOGETHER','TOO','TOOK','TOWARD','TOWARDS','TRIED','TRIES','TRULY','TRY','TRYING','TWICE','TWO',
                        'UN','UNDER','UNFORTUNATELY','UNLESS','UNLIKELY','UNTIL','UNTO','UP','UPON','US','USE','USED','USEFUL','USES','USING','USUALLY',
                        'VALUE','VARIOUS','VERY','VIA','VIZ','VS','WANT','WANTS','WAS','WAY','WE','WELCOME','WELL','WENT','WERE','WHAT','WHATEVER','WHEN',
                        'WHENCE','WHENEVER','WHERE','WHEREAFTER','WHEREAS','WHEREBY','WHEREIN','WHEREUPON','WHEREVER','WHETHER','WHICH','WHILE','WHITHER',
                        'WHO','WHOEVER','WHOLE','WHOM','WHOSE','WHY','WILL','WILLING','WISH','WITH','WITHIN','WITHOUT','WONDER','WOULD','YES','YET','YOU',
                        'YOUR','YOURS','YOURSELF','YOURSELVES, 'zzz', 'ZZZ', zzSTOPzz']'''

            str_split = string.split(' ')
            keys = [word for word in str_split if word.upper() not in my_stopwords]
            print ("Split Input: ", keys)
            num_wds = len(keys)
            print("Number of words = ", num_wds, "\n")
大多数情况下,这是可行的,但关键字“援助”让我进退两难。下面是示例输出

在初始查询(代码未显示)之后,我得到以下结果

Input Abstract:  PMID21839526zzz BONE-ANCHORED HEARING **AID** (BAHA) IN PATIENTS WITH TREACHER COLLINS SYNDROME:  ....
在我清除标点符号等之后,我得到了以下结果

Cleaned Input:  PMID21839526zzz BONE-ANCHORED HEARING **AID** BAHA IN PATIENTS WITH TREACHER COLLINS SYNDROME....
在我运行上面的代码来拆分空格并删除不包含单词AID的stopwords列表之后,我得到了以下结果。请注意,“援助”一词已从列表中删除,这与我的目的背道而驰

Split Input:  ['PMID21839526zzz', 'BONE-ANCHORED', 'HEARING', 'BAHA', 'PATIENTS', 'TREACHER', 'COLLINS', 'SYNDROME',....

此代码与其他关键字(包括“AIDS”、“MAGNETIC”等)一起正常工作。问题出现在三个字母的关键字“AID”上。如果您能解释一下为什么会发生这种情况,我将不胜感激。我希望这足够清楚。谢谢你的帮助。

我不太明白你的算法,但你的停止词列表必须是
列表(最好是
集合),而不是字符串:

my_stopwords = set(['A','ABLE','ABOUT','ABOVE','ACCORDING',])
否则,您只是进行子字符串匹配,而不是列表中的精确字符串匹配


e、 例如,对于
s=“['THEY','theme']”
,s中的
'HE'是真的。如果
s=['theme','theme']
,则s
中的
'HE'为非真。前者是一个字符串,其内容类似于python
列表的语法。后者是python
列表

谢谢!我觉得这是我不明白的。谢谢。“设置”就是解决方案。现在必须去阅读更多关于集合的内容。谢谢你的引导。