Python字典中的键解析和排序
我创建了以下词典:Python字典中的键解析和排序,python,dictionary,xml-parsing,beautifulsoup,Python,Dictionary,Xml Parsing,Beautifulsoup,我创建了以下词典: code dictionary = {u'News; comment; negative': u'contradictory about news', u'News; comment': u'something about news'} 现在,我想编写一些Python代码,它遍历字典的键并分离代码及其对应的值。因此,对于字典中的第一个元素,我想以: News: 'contradictory about news', 'something about news' comme
code dictionary = {u'News; comment; negative': u'contradictory about news', u'News; comment': u'something about news'}
现在,我想编写一些Python代码,它遍历字典的键并分离代码及其对应的值。因此,对于字典中的第一个元素,我想以:
News: 'contradictory about news', 'something about news'
comment: 'contradictory about news', 'something about news'
negative: 'contradictory about news'
最终结果可以是字典、列表、制表符或逗号分隔的文本
您可以在此处看到我的尝试:
from bs4 import BeautifulSoup as Soup
f = open('transcript.xml','r')
soup = Soup(f)
#print soup.prettify()
#searches text for all w:commentrangestart tags and makes a dictionary that matches ids with text
textdict = {}
for i in soup.find_all('w:commentrangestart'):
# variable 'key' is assigned to the tag id
key = i.parent.contents[1].attrs['w:id']
key = str(key)
#variable 'value' is assigned to the tag's text
value= ''.join(i.nextSibling.findAll(text=True))
# key / value pairs are added to the dictionary 'textdict'
textdict[key]=value
print "Transcript Text = " , textdict
# makes a dictionary that matches ids with codes
codedict = {}
for i in soup.find_all('w:comment'):
key = i.attrs['w:id']
key = str(key)
value= ''.join(i.findAll(text=True))
codedict[key]=value
print "Codes = ", codedict
# makes a dictionary that matches all codes with text
output = {}
for key in set(textdict.keys()).union(codedict.keys()):
print "key= ", key
txt = textdict[key]
print "txt = ", txt
ct = codedict[key]
print "ct= ", ct
output[ct] = txt
#print "output = ", output
print "All code dictionary = ", output
#codelist={}
#for key in output:
# codelist =key.split(";")
#print "codelist= " , codelist
code_negative = {}
code_news = {}
print output.keys()
for i in output:
if 'negative' in output.keys():
print 'yay'
code_negative[i]=textdict[i]
print 'text coded negative: ' , code_negative
if 'News' in i:
code_news[i]=textdict[i]
print 'text coded News: ' ,code_news
但出于某种原因,在运行最后一个函数时,我不断遇到一个键错误:
code_negative = {}
code_news = {}
for i in output:
if 'negative' in output.keys():
code_negative[i]=textdict[i]
print 'text coded negative: ' , code_negative
if 'News' in i:
code_news[i]=textdict[i]
print 'text coded News: ' ,code_news
有什么想法吗?谢谢 如果我正确理解了问题,以下代码应该可以工作:
from collections import defaultdict
out = defaultdict(list)
for k, v in code_dictionary.viewitems():
for item in k.split('; '):
out[item].append(v)
使用split函数不断迭代所需内容…类似于for循环,然后返回i.split(“;”)。这应该允许您迭代您需要的内容
output = {u'News; comment; negative': u'contradictory about news', u'News; comment': u'something about news'}
negatives = []
comments = []
news = []
for k, v in output.items():
key_parts = k.split('; ')
key_parts = [part.lower() for part in key_parts]
if 'negative' in key_parts:
negatives.append(v)
if 'news' in key_parts:
news.append(v)
if 'comment' in key_parts:
comments.append(v)