解析xml文件并提取<；引用>；用python_Python_Html_Xhtml

解析xml文件并提取<；引用>；用python

python html

解析xml文件并提取<；引用>；用python,python,html,xhtml,Python,Html,Xhtml,我有： /这是一件小事： from bs4 import BeautifulSoup soup = BeautifulSoup(yoursource) for cite in soup.find_all('cite'): print cite.string 对于生成以下内容的给定示例文件： >>> import requests >>> from bs4 import BeautifulSoup >>> r = requests

我有：

/这是一件小事：
from bs4 import BeautifulSoup

soup = BeautifulSoup(yoursource)

for cite in soup.find_all('cite'):
    print cite.string

对于生成以下内容的给定示例文件：
>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get('https://raw.githubusercontent.com/MortezaLSC/Puplic/master/file.xml')
>>> soup = BeautifulSoup(r.content)
>>> for cite in soup.find_all('cite'):
...     print cite.string
... 
taksuncontrol.com
www.royal-jelve.ir

如果我有一个文件而不是url，我应该执行r=open（'/path/to/file.xml'，r+
和soup=beautifulsop（r.content）…。
@MortezaLSC:如果你有一个文件，只需将file对象传递给beautifulsop打开（'/path/to/file.xml'）作为infh:soup=BeautifulSoup（infh）。非常感谢…：）对不起，我还有一个问题，我有一个名为content.txt的大文件，它是4个bing页面的结果。当我运行你的答案时，它只显示第一个bing页面的结果…为什么？谢谢您一次只能解析一页，而不是4页。你必须把这些页面分开，分别输入BeautifulSoup。
>>> import requests
>>> from bs4 import BeautifulSoup
>>> r = requests.get('https://raw.githubusercontent.com/MortezaLSC/Puplic/master/file.xml')
>>> soup = BeautifulSoup(r.content)
>>> for cite in soup.find_all('cite'):
...     print cite.string
... 
taksuncontrol.com
www.royal-jelve.ir