Python 使用for循环将Pandas系列转换为数据帧

Python 使用for循环将Pandas系列转换为数据帧,python,pandas,Python,Pandas,提前感谢任何人的帮助 我试图用以下逻辑将这个熊猫系列转换成一个数据帧 每当序列中的一行以“MB”开头时,它都应该在数据帧中创建另一列,并且在下一个“MB”之前,它下面的所有行都应该位于该列之下 MB104 TR15 TR16 SP16 MB301 TR16 SP11 SP16 SP26 SP67 MB302 TR15 MB504 TR15 SP16 SP67 SP109 MB652 SP109 SP110 为此: MB104 MB031 MB302 MB504 MB65

提前感谢任何人的帮助

我试图用以下逻辑将这个熊猫系列转换成一个数据帧

每当序列中的一行以“MB”开头时,它都应该在数据帧中创建另一列,并且在下一个“MB”之前,它下面的所有行都应该位于该列之下

MB104
TR15
TR16
SP16
MB301
TR16
SP11
SP16
SP26
SP67
MB302
TR15
MB504
TR15
SP16
SP67
SP109
MB652
SP109
SP110
为此:

MB104    MB031    MB302    MB504    MB652
TR15     TR16     TR15     TR15     SP109
TR16     SP11              SP16     SP110
SP16     SP16              SP67
         SP26              SP109
         SP67
这就是我到目前为止所尝试的

mbdf = pd.DataFrame()
assetlist = []
for row in mbs.itertuples():
    left2 = row.data[:2]
    if left2 == 'MB':
        if headername:
            mbdf[headername] = pd.Series(assetlist)
    
    headername = row.data
    assetlist = []
else:
    assetname = row.data
    assetlist.append(assetname)

从您的问题中不清楚您是希望它们作为单独的系列,还是在同一数据帧中组合在一起。我假设您想要一个数据帧:

# Read the data
from collections import defaultdict
data = defaultdict(list)
col = None

with open('data.txt') as fp:
    for line in fp:
        line = line.strip('\n')
        if line.startswith('MB'):
            col = line
        else:
            data[col].append(line)
# Pad every column to the same length
max_len = max(len(v) for v in data.values())

for key, value in data.items():
    value += [None for _ in range(max_len - len(value))]

df = pd.DataFrame(data)
如果需要系列的集合,请执行以下操作:

series = [pd.Series(value, name=key) for key, value in data.items()]
如果需要数据帧:

# Read the data
from collections import defaultdict
data = defaultdict(list)
col = None

with open('data.txt') as fp:
    for line in fp:
        line = line.strip('\n')
        if line.startswith('MB'):
            col = line
        else:
            data[col].append(line)
# Pad every column to the same length
max_len = max(len(v) for v in data.values())

for key, value in data.items():
    value += [None for _ in range(max_len - len(value))]

df = pd.DataFrame(data)

从您的问题中不清楚您是希望它们作为单独的系列,还是在同一数据帧中组合在一起。我假设您想要一个数据帧:

# Read the data
from collections import defaultdict
data = defaultdict(list)
col = None

with open('data.txt') as fp:
    for line in fp:
        line = line.strip('\n')
        if line.startswith('MB'):
            col = line
        else:
            data[col].append(line)
# Pad every column to the same length
max_len = max(len(v) for v in data.values())

for key, value in data.items():
    value += [None for _ in range(max_len - len(value))]

df = pd.DataFrame(data)
如果需要系列的集合,请执行以下操作:

series = [pd.Series(value, name=key) for key, value in data.items()]
如果需要数据帧:

# Read the data
from collections import defaultdict
data = defaultdict(list)
col = None

with open('data.txt') as fp:
    for line in fp:
        line = line.strip('\n')
        if line.startswith('MB'):
            col = line
        else:
            data[col].append(line)
# Pad every column to the same length
max_len = max(len(v) for v in data.values())

for key, value in data.items():
    value += [None for _ in range(max_len - len(value))]

df = pd.DataFrame(data)

我给你举个例子,列出一个清单。每一行都用代码解释:

代码

import pandas as pd

# Example data:
example_data = ['MB104','TR15','TR16','SP16','MB301','TR16','SP11','SP16','SP26','SP67','MB302','TR15','MB504','TR15','SP16','SP67','SP109','MB652','SP109','SP110']

# Initialize empty dict of data that will be converted to a dataframe:
final_data = {}

# For every value of the list:
for i in example_data:
    # If start with MB
    if i.startswith('MB'):
        # Save data
        new_mb = i
        continue
    # If MB is in final_data:
    if new_mb in final_data:
        # Append the element (it will be different of MB*)
        final_data[new_mb].append(i)
    # Else: append first element of MB
    else:
        final_data[new_mb] = [i]

# Get the max length of all list values of the dictionary:
max_items = len(final_data.items())

# For every key and list of values:
for key, values in final_data.items():
    # For every value in the list of values:
    for x in values:
        # If length of all list is less than max_items
        if len(values) < max_items:
            # Append empty string:
            values.append("")

# Convert dataframe:
df = pd.DataFrame(final_data)

我给你举个例子,列出一个清单。每一行都用代码解释:

代码

import pandas as pd

# Example data:
example_data = ['MB104','TR15','TR16','SP16','MB301','TR16','SP11','SP16','SP26','SP67','MB302','TR15','MB504','TR15','SP16','SP67','SP109','MB652','SP109','SP110']

# Initialize empty dict of data that will be converted to a dataframe:
final_data = {}

# For every value of the list:
for i in example_data:
    # If start with MB
    if i.startswith('MB'):
        # Save data
        new_mb = i
        continue
    # If MB is in final_data:
    if new_mb in final_data:
        # Append the element (it will be different of MB*)
        final_data[new_mb].append(i)
    # Else: append first element of MB
    else:
        final_data[new_mb] = [i]

# Get the max length of all list values of the dictionary:
max_items = len(final_data.items())

# For every key and list of values:
for key, values in final_data.items():
    # For every value in the list of values:
    for x in values:
        # If length of all list is less than max_items
        if len(values) < max_items:
            # Append empty string:
            values.append("")

# Convert dataframe:
df = pd.DataFrame(final_data)

假设
mbs
pandas.Series
。您在问题中说的是
series
,但它似乎是一个
dataframe
,因为
series
没有功能
。itertuples()

输出:

如果您喜欢空字符串而不是
NaN
值(我建议
NaN
进行进一步分析)

输出:


假设
mbs
pandas.Series
。您在问题中说的是
series
,但它似乎是一个
dataframe
,因为
series
没有功能
。itertuples()

输出:

如果您喜欢空字符串而不是
NaN
值(我建议
NaN
进行进一步分析)

输出:


非常感谢。说实话,我仍然对熊猫能做的事感到惊讶@ortunoa如果某个答案解决了您的疑问或问题,并且答案是有帮助的,请接受它。)非常感谢你。说实话,我仍然对熊猫能做的事感到惊讶@ortunoa如果某个答案解决了您的疑问或问题,并且答案是有帮助的,请接受它。()