Python 查找具有不同标题的行_Python_Pandas_Dataframe

Python 查找具有不同标题的行

python pandas dataframe

Python 查找具有不同标题的行,python,pandas,dataframe,Python,Pandas,Dataframe,我的数据帧 Name Value 0 K <apple WK1> contents 1 Y <banana WK2> contents 2 B <orange WK1> contents 3 Q <grape WK31> contents 4 C <app

我的数据帧

    Name    Value
0   K       <apple WK1>
            contents
1   Y       <banana WK2>
            contents
2   B       <orange WK1>
            contents
3   Q       <grape WK31>
            contents
4   C       <apple WK12>
            contents
5   A       <apple WK22>
            contents

new_df = pd.Series(
    (df["Value"].str.split("\s", expand=True).drop_duplicates(subset=[0])[0] + ">"),
    name="Value",
).to_frame()


print(new_df)
      Value
0   <apple>
1  <banana>
2  <orange>
3   <grape>

如果不维护现有的数据帧，这并不重要

但是，我只想获得不重叠的标题值

复制：

df1 = df(data={'Name' : ['K', 'Y', 'B','Q','C','A'], 'Value' : ['<apple WK1>','<banana WK2>','<orange WK1>','<grape WK31>','<apple WK12>','<apple WK22>']}, columns = ['Name', 'Value'])

df1=df（数据={'Name'：['K'，'Y'，'B'，'Q'，'C'，'A']，'Value'：[''，''，''，''，''，''，']}，列=['Name'，'Value']）

试试看：

完整示例：

# build dataframe
df = pd.DataFrame(data={'Name' : ['K', 'Y', 'B','Q','C','A'], 'Value' : ['<apple WK1>','<banana WK2>','<orange WK1>','<grape WK31>','<apple WK12>','<apple WK22>']}, columns = ['Name', 'Value']) 

print(df)
#   Name         Value
# 0    K   <apple WK1>
# 1    Y  <banana WK2>
# 2    B  <orange WK1>
# 3    Q  <grape WK31>
# 4    C  <apple WK12>
# 5    A  <apple WK22>

# Only select content
out_1 = df["Value"].str.extract(r'<([a-z]*)\s+').drop_duplicates()
print(out_1)
#         0
# 0   apple
# 1  banana
# 2  orange
# 3   grape

# Select content and "<" - ">"
out_2 = (df["Value"].str.extract(r'(<[a-z]*)\s+') + ">").drop_duplicates()
print(out_2)
#           0
# 0   <apple>
# 1  <banana>
# 2  <orange>
# 3   <grape>

#构建数据帧
数据帧（数据={'Name'：['K'，'Y'，'B'，'Q'，'C'，'A']，'Value'：[''，''，''，''，''，''，''，']}，列=['Name'，'Value']）
打印（df）
#名称值
#0千
#1年
#2 B
#3问
#4 C
#5 A
#仅选择内容
out_1=df[“Value”].str.extract（r'IIUC，您可以使用str.split
和drop_duplicates
来获取您想要的值
然后使用pd.series
和to_frame
方法返回新的数据帧
    Name    Value
0   K       <apple WK1>
            contents
1   Y       <banana WK2>
            contents
2   B       <orange WK1>
            contents
3   Q       <grape WK31>
            contents
4   C       <apple WK12>
            contents
5   A       <apple WK22>
            contents

new_df = pd.Series(
    (df["Value"].str.split("\s", expand=True).drop_duplicates(subset=[0])[0] + ">"),
    name="Value",
).to_frame()


print(new_df)
      Value
0   <apple>
1  <banana>
2  <orange>
3   <grape>

new_df=pd.系列(
（df[“Value”].str.split（“\s”，expand=True）。drop_重复项（子集=[0]）[0]+“>”，
name=“Value”，
).to_frame（）
打印（新文档）
价值
0
1.
2.
3.
你尝试过什么？无论我如何工作，我都在问……你能给我写一个数据帧格式代码df1=df（数据={'Name'：['K'、'Y'、'B'、'Q'、'C'、'A']、'Value'：[''、''、''、''、''、''、''、']}，列=['Name'、'Value']）吗
new_df = pd.Series(
    (df["Value"].str.split("\s", expand=True).drop_duplicates(subset=[0])[0] + ">"),
    name="Value",
).to_frame()


print(new_df)
      Value
0   <apple>
1  <banana>
2  <orange>
3   <grape>