Python 使用for循环将Pandas系列转换为数据帧
提前感谢任何人的帮助 我试图用以下逻辑将这个熊猫系列转换成一个数据帧 每当序列中的一行以“MB”开头时,它都应该在数据帧中创建另一列,并且在下一个“MB”之前,它下面的所有行都应该位于该列之下Python 使用for循环将Pandas系列转换为数据帧,python,pandas,Python,Pandas,提前感谢任何人的帮助 我试图用以下逻辑将这个熊猫系列转换成一个数据帧 每当序列中的一行以“MB”开头时,它都应该在数据帧中创建另一列,并且在下一个“MB”之前,它下面的所有行都应该位于该列之下 MB104 TR15 TR16 SP16 MB301 TR16 SP11 SP16 SP26 SP67 MB302 TR15 MB504 TR15 SP16 SP67 SP109 MB652 SP109 SP110 为此: MB104 MB031 MB302 MB504 MB65
MB104
TR15
TR16
SP16
MB301
TR16
SP11
SP16
SP26
SP67
MB302
TR15
MB504
TR15
SP16
SP67
SP109
MB652
SP109
SP110
为此:
MB104 MB031 MB302 MB504 MB652
TR15 TR16 TR15 TR15 SP109
TR16 SP11 SP16 SP110
SP16 SP16 SP67
SP26 SP109
SP67
这就是我到目前为止所尝试的
mbdf = pd.DataFrame()
assetlist = []
for row in mbs.itertuples():
left2 = row.data[:2]
if left2 == 'MB':
if headername:
mbdf[headername] = pd.Series(assetlist)
headername = row.data
assetlist = []
else:
assetname = row.data
assetlist.append(assetname)
从您的问题中不清楚您是希望它们作为单独的系列,还是在同一数据帧中组合在一起。我假设您想要一个数据帧:
# Read the data
from collections import defaultdict
data = defaultdict(list)
col = None
with open('data.txt') as fp:
for line in fp:
line = line.strip('\n')
if line.startswith('MB'):
col = line
else:
data[col].append(line)
# Pad every column to the same length
max_len = max(len(v) for v in data.values())
for key, value in data.items():
value += [None for _ in range(max_len - len(value))]
df = pd.DataFrame(data)
如果需要系列的集合,请执行以下操作:
series = [pd.Series(value, name=key) for key, value in data.items()]
如果需要数据帧:
# Read the data
from collections import defaultdict
data = defaultdict(list)
col = None
with open('data.txt') as fp:
for line in fp:
line = line.strip('\n')
if line.startswith('MB'):
col = line
else:
data[col].append(line)
# Pad every column to the same length
max_len = max(len(v) for v in data.values())
for key, value in data.items():
value += [None for _ in range(max_len - len(value))]
df = pd.DataFrame(data)
从您的问题中不清楚您是希望它们作为单独的系列,还是在同一数据帧中组合在一起。我假设您想要一个数据帧:
# Read the data
from collections import defaultdict
data = defaultdict(list)
col = None
with open('data.txt') as fp:
for line in fp:
line = line.strip('\n')
if line.startswith('MB'):
col = line
else:
data[col].append(line)
# Pad every column to the same length
max_len = max(len(v) for v in data.values())
for key, value in data.items():
value += [None for _ in range(max_len - len(value))]
df = pd.DataFrame(data)
如果需要系列的集合,请执行以下操作:
series = [pd.Series(value, name=key) for key, value in data.items()]
如果需要数据帧:
# Read the data
from collections import defaultdict
data = defaultdict(list)
col = None
with open('data.txt') as fp:
for line in fp:
line = line.strip('\n')
if line.startswith('MB'):
col = line
else:
data[col].append(line)
# Pad every column to the same length
max_len = max(len(v) for v in data.values())
for key, value in data.items():
value += [None for _ in range(max_len - len(value))]
df = pd.DataFrame(data)
我给你举个例子,列出一个清单。每一行都用代码解释: 代码:
import pandas as pd
# Example data:
example_data = ['MB104','TR15','TR16','SP16','MB301','TR16','SP11','SP16','SP26','SP67','MB302','TR15','MB504','TR15','SP16','SP67','SP109','MB652','SP109','SP110']
# Initialize empty dict of data that will be converted to a dataframe:
final_data = {}
# For every value of the list:
for i in example_data:
# If start with MB
if i.startswith('MB'):
# Save data
new_mb = i
continue
# If MB is in final_data:
if new_mb in final_data:
# Append the element (it will be different of MB*)
final_data[new_mb].append(i)
# Else: append first element of MB
else:
final_data[new_mb] = [i]
# Get the max length of all list values of the dictionary:
max_items = len(final_data.items())
# For every key and list of values:
for key, values in final_data.items():
# For every value in the list of values:
for x in values:
# If length of all list is less than max_items
if len(values) < max_items:
# Append empty string:
values.append("")
# Convert dataframe:
df = pd.DataFrame(final_data)
我给你举个例子,列出一个清单。每一行都用代码解释: 代码:
import pandas as pd
# Example data:
example_data = ['MB104','TR15','TR16','SP16','MB301','TR16','SP11','SP16','SP26','SP67','MB302','TR15','MB504','TR15','SP16','SP67','SP109','MB652','SP109','SP110']
# Initialize empty dict of data that will be converted to a dataframe:
final_data = {}
# For every value of the list:
for i in example_data:
# If start with MB
if i.startswith('MB'):
# Save data
new_mb = i
continue
# If MB is in final_data:
if new_mb in final_data:
# Append the element (it will be different of MB*)
final_data[new_mb].append(i)
# Else: append first element of MB
else:
final_data[new_mb] = [i]
# Get the max length of all list values of the dictionary:
max_items = len(final_data.items())
# For every key and list of values:
for key, values in final_data.items():
# For every value in the list of values:
for x in values:
# If length of all list is less than max_items
if len(values) < max_items:
# Append empty string:
values.append("")
# Convert dataframe:
df = pd.DataFrame(final_data)
假设
mbs
是pandas.Series
。您在问题中说的是series
,但它似乎是一个dataframe
,因为series
没有功能。itertuples()
输出:
如果您喜欢空字符串而不是NaN
值(我建议NaN
进行进一步分析)
输出:
假设
mbs
是pandas.Series
。您在问题中说的是series
,但它似乎是一个dataframe
,因为series
没有功能。itertuples()
输出:
如果您喜欢空字符串而不是NaN
值(我建议NaN
进行进一步分析)
输出:
非常感谢。说实话,我仍然对熊猫能做的事感到惊讶@ortunoa如果某个答案解决了您的疑问或问题,并且答案是有帮助的,请接受它。)非常感谢你。说实话,我仍然对熊猫能做的事感到惊讶@ortunoa如果某个答案解决了您的疑问或问题,并且答案是有帮助的,请接受它。()