Python 使用BeautifulSoup排除结果集中标记（<；topic>；）内的标记（<；pattern>；）_Python_Beautifulsoup_Tags_Aiml

Python 使用BeautifulSoup排除结果集中标记（<；topic>；）内的标记（<；pattern>；）

python tags

Python 使用BeautifulSoup排除结果集中标记（<；topic>；）内的标记（<；pattern>；）,python,beautifulsoup,tags,aiml,Python,Beautifulsoup,Tags,Aiml,我刚刚开始使用Python进行web抓取，目前我正在使用BeautifulSoup进行数据提取。我有一个.aiml文件（xml），我想从标签模式中提取所有数据，这些标签未包含在主题标签中我已经得到了所有的模式值，但这里的挑战是，那些父标记为主题的模式不应该包含在结果集中以下是aiml文件： <?xml version = "1.0" encoding = "UTF-8"?> <aiml version="1.0.1" encoding="UTF-8"> <

我刚刚开始使用Python进行web抓取，目前我正在使用BeautifulSoup进行数据提取。我有一个.aiml文件（xml），我想从标签模式中提取所有数据，这些标签未包含在主题标签中
我已经得到了所有的模式值，但这里的挑战是，那些父标记为主题的模式不应该包含在结果集中
以下是aiml文件：

<?xml version = "1.0" encoding = "UTF-8"?> <aiml version="1.0.1" encoding="UTF-8"> <topic name="botdog"> <category> <pattern>MY DOG'S NAME IS *</pattern> <template> That is interesting that you have a dog named <set name="dog"><star/></set> </template> </category> <category> <pattern>WHAT IS MY DOG'S NAME</pattern> <template> Your dog's name is <get name="dog"/>. </template> </category> </topic> <topic name="botcat"> <category> <pattern>MY CAT'S NAME IS *</pattern> <template> That is interesting that you have a cat named <set name="cat"><star/></set> </template> </category> <category> <pattern>WHAT IS MY CAT'S NAME</pattern> <template> Your cat's name is <get name="cat"/>. </template> </category> </topic> <category> <pattern>HELLO ALICE</pattern> <template> Hello User </template> </category> <category> <pattern>HOW ARE YOU</pattern> <template> I'm fine </template> </category> </aiml>
打印时的返回值（）为：
[“我的狗叫*”，“我的狗叫什么名字”，“我的猫叫*”，“我的猫叫什么名字”，“你好，爱丽丝”，“你好吗”]
它应该是这样的，因为它没有父标记主题： [“你好，爱丽丝”，“你好吗”]试试这个：

@extract.route('/') def index_page(): folder = 'templates/topic.aiml' with open(folder, 'r') as myfile: soup = BeautifulSoup(myfile.read(), 'html.parser') data = [] for cat in soup.find_all('category'): if cat.parent.name == "topic": continue data += [cat.find("pattern").text] print(data) return jsonify({'data_set': data})
希望这有帮助！查看更多示例。
太棒了！谢谢你，伙计！：）在我看到您的答案之前，我所做的是：*对于汤中的x.find_all（'topic'）：x.extract（）**，当然这是不合适的，因为我已经删除了标记。但是当我使用你的代码打印汤时，它仍然在那里，我得到了预期的结果集。：）再次感谢！很高兴我能帮忙，去印更多的汤：）
@extract.route('/') def index_page(): folder = 'templates/topic.aiml' with open(folder, 'r') as myfile: soup = BeautifulSoup(myfile.read(), 'html.parser') data = [] for cat in soup.find_all('category'): if cat.parent.name == "topic": continue data += [cat.find("pattern").text] print(data) return jsonify({'data_set': data})