Python 2.7 如何在dataframe中读取如下所示的xml数据?

Python 2.7 如何在dataframe中读取如下所示的xml数据?,python-2.7,python-3.x,pandas,dataframe,data-science,Python 2.7,Python 3.x,Pandas,Dataframe,Data Science,我想读取粗体字作为数据框中的列名,粗体字后面的字符串作为特定行的值 <posts> <**row Id**="5" PostTypeId="1" **CreationDate**="2014-05-13T23:58:30.457" **Score**="7" ViewCount="315" **Body**="<p>I've always been interested in machine learning, but I can't figure out one

我想读取粗体字作为数据框中的列名,粗体字后面的字符串作为特定行的值

<posts>

<**row Id**="5" PostTypeId="1" **CreationDate**="2014-05-13T23:58:30.457" **Score**="7" ViewCount="315" **Body**="<p>I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior?</p><p>For example, if I wanted to "teach" a bot how to avoid randomly placed obstacles, I couldn't just use relative motion, because the obstacles move around, but I don't want to hard code, say, distance, because that ruins the whole point of machine learning.</p><p>Obviously, randomly generating code would be impractical, so how could I do this?</p>" **OwnerUserId**="5" LastActivityDate="2014-05-14T00:36:31.077" Title="How can I do simple machine learning without hard-coding behavior?" Tags="<machine-learning>" AnswerCount="1" CommentCount="1" FavoriteCount="1" ClosedDate="2014-05-14T14:40:25.950"/>

<**row Id**="7" **PostTypeId**="1" **AcceptedAnswerId**="10" CreationDate="2014-05-14T00:11:06.457" Score="2" ViewCount="297" Body="<p>As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that provides material suitable for a college-level course, not particular pieces or papers.</p>" OwnerUserId="36" LastEditorUserId="97" LastEditDate="2014-05-16T13:45:00.237"LastActivityDate="2014-05-16T13:45:00.237" Title="What open-source books (or other materials) provide a relatively thorough overview of data science?" Tags="<education><open-source>" AnswerCount="3" CommentCount="4" FavoriteCount="1" **ClosedDate**="2014-05-14T08:40:54.950"/>

</posts>


考虑使用StackExchange API而不是屏幕抓取。不,不是刮屏。数据集已从stack exchange网站下载,其中包含这些xml格式的文件。建议将其放入pandas数据框的方法。另外,SO不是一个代码编写服务。您需要向我们展示您在尝试您的问题时所做的任何想法和努力。请尝试使用
etree
进行解析,然后将值映射到dataframe。返回时有任何问题。@不管怎样,StackExchange API对您来说可能比“文件刮取”更有用。