Python 将值连接到Panda系列中_Python_Pandas_Beautifulsoup

Python 将值连接到Panda系列中

python pandas

Python 将值连接到Panda系列中,python,pandas,beautifulsoup,Python,Pandas,Beautifulsoup,我收到API请求的以下回复： <movies> <movie> <rating>5</rating> <name>star wars</name> </movie> <movie> <rating>8</rating> <name>jurassic park</name> </

我收到API请求的以下回复：

    <movies>
    <movie>
    <rating>5</rating>
    <name>star wars</name>
    </movie>
    <movie>
    <rating>8</rating>
    <name>jurassic park</name>
    </movie>
    </movies>

您会注意到，我已经获取了在响应中找到的每个值，并将它们添加到“一”列中。例如，我想把5个连接“-”和《星球大战》加在一起。

这就是你想要的吗？我已经在代码中一步一步地解释了。有一部分我不知道怎么做，但我研究并弄明白了

import pandas as pd
import numpy as np
df = pd.DataFrame({'Data' : ['<movies>','<movie>','<rating>5</rating>',
                             '<name>star wars</name>', '</movie>', 
                             '<rating>8</rating>', '<name>jurassic park</name>', 
                             '</movie>', '</movies>']})
#Filter for the relevant rows of data based upon the logic of the pattern. I have also 
#done an optional reset of the index.
df = df.loc[df['Data'].str.contains('>.*<', regex=True)].reset_index(drop=True)
#For the rows we just filtered for, get rid of the irrelevant data with some regex 
#string manipulation
df['Data'] = df['Data'].str.findall('>.*<').str[0].replace(['>','<'], '', regex=True)
#Use join with shift and add_suffix CREDIT to @joelostblom:
#https://stackoverflow.com/questions/47450259/merge-row-with-next-row-in-dataframe- 
#pandas
df = df.add_suffix('1').join(df.shift(-1).add_suffix('2'))
#Filter for numeric rows only
df = df.loc[df['Data1'].str.isnumeric() == True]
#Combine Columns with desired format
df['Movie Rating'] = df['Data1'] + ' - ' + df['Data2']
#Filter for only relevant column and print dataframe
df = df[['Movie Rating']]
print(df)

将熊猫作为pd导入
将numpy作为np导入
df=pd.DataFrame（{'Data'：[''，'5',，
《星球大战》，“，
"8","侏罗纪公园",，
'', '']})
#根据模式的逻辑筛选相关数据行。我也有
#完成了索引的可选重置。
df=df.loc[df['Data'].str.contains（'>...*'，'这是否回答了您的问题？这是一个很好的教程，内容与此相同：我可能没有很好地进行示例，因此我很抱歉，我知道我可以将每个值映射到它自己的列（系列），我想做的是从我的问题中指出的响应中捕获两个值，并将这两个值放在同一列下。从我刚刚阅读的文档中，似乎涵盖了为每个值创建新列，这是我不想做的。就是这样！感谢您的详细解释！
import pandas as pd
import numpy as np
df = pd.DataFrame({'Data' : ['<movies>','<movie>','<rating>5</rating>',
                             '<name>star wars</name>', '</movie>', 
                             '<rating>8</rating>', '<name>jurassic park</name>', 
                             '</movie>', '</movies>']})
#Filter for the relevant rows of data based upon the logic of the pattern. I have also 
#done an optional reset of the index.
df = df.loc[df['Data'].str.contains('>.*<', regex=True)].reset_index(drop=True)
#For the rows we just filtered for, get rid of the irrelevant data with some regex 
#string manipulation
df['Data'] = df['Data'].str.findall('>.*<').str[0].replace(['>','<'], '', regex=True)
#Use join with shift and add_suffix CREDIT to @joelostblom:
#https://stackoverflow.com/questions/47450259/merge-row-with-next-row-in-dataframe- 
#pandas
df = df.add_suffix('1').join(df.shift(-1).add_suffix('2'))
#Filter for numeric rows only
df = df.loc[df['Data1'].str.isnumeric() == True]
#Combine Columns with desired format
df['Movie Rating'] = df['Data1'] + ' - ' + df['Data2']
#Filter for only relevant column and print dataframe
df = df[['Movie Rating']]
print(df)