Python:按特定条件透视表/组
我正在尝试更改文本文件(.txt)中数据的结构,该文件中的数据如下所示:Python:按特定条件透视表/组,python,pandas,dataframe,pivot-table,Python,Pandas,Dataframe,Pivot Table,我正在尝试更改文本文件(.txt)中数据的结构,该文件中的数据如下所示: :1:A :2:B :3:C :1:D :2:E :3:F :4:G :1:H :3:I :4:J 我想将它们转换成这种格式(比如excel中的pivot表,其列名介于“:”之间,并且每个组始终以:1:”开头) 有人知道吗?提前感谢。首先使用头=无创建数据帧,因为文件中没有头: import pandas as pd temp=u""":1:A :2:B :3:C :1:D :2:E :3:F :4:G :1:H :3
:1:A
:2:B
:3:C
:1:D
:2:E
:3:F
:4:G
:1:H
:3:I
:4:J
我想将它们转换成这种格式(比如excel中的pivot表,其列名介于“:”之间,并且每个组始终以:1:”开头)
有人知道吗?提前感谢。首先使用头=无创建数据帧,因为文件中没有头:
import pandas as pd
temp=u""":1:A
:2:B
:3:C
:1:D
:2:E
:3:F
:4:G
:1:H
:3:I
:4:J"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
0
0 :1:A
1 :2:B
2 :3:C
3 :1:D
4 :2:E
5 :3:F
6 :4:G
7 :1:H
8 :3:I
9 :4:J
通过提取原始列,然后删除跟踪:
by和2个新列的值。然后创建组,方法是通过与进行比较,为=
创建字符串0
with,创建多索引依据,最后通过以下方式重塑:
使用:
另一种方法是:
输出将如下所示:
非常感谢你的回答。我花了几天时间试图解决这个问题。
import pandas as pd
temp=u""":1:A
:2:B
:3:C
:1:D
:2:E
:3:F
:4:G
:1:H
:3:I
:4:J"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
0
0 :1:A
1 :2:B
2 :3:C
3 :1:D
4 :2:E
5 :3:F
6 :4:G
7 :1:H
8 :3:I
9 :4:J
df[['a','b']] = df.pop(0).str.strip(':').str.split(':', expand=True)
df1 = df.set_index([df['a'].eq('1').cumsum(), 'a'])['b'].unstack(fill_value='')
print (df1)
a 1 2 3 4
a
1 A B C
2 D E F G
3 H I J
# Reading text file (assuming stored in CSV format, you can also use pd.read_fwf)
df = pd.read_csv('SO.csv', header=None)
# Splitting data into two columns
ndf = df.iloc[:, 0].str.split(':', expand=True).iloc[:, 1:]
# Grouping and creating a dataframe. Later dropping NaNs
res = ndf.groupby(1)[2].apply(pd.DataFrame).apply(lambda x: pd.Series(x.dropna().values))
# Post processing (optional)
res.columns = [':' + ndf[1].unique()[i] + ':' for i in range(ndf[1].nunique())]
res.index.name = 'Group'
res.index = range(1, res.shape[0] + 1)
res
Group :1: :2: :3: :4:
1 A B C
2 D E F G
3 H I J
#read the file
with open("t.txt") as f:
content = f.readlines()
#Create a dictionary and read each line from file to keep the column names (ex, :1:) as keys and rows(ex, A) as values in dictionary.
my_dict={}
for v in content:
key = v.rstrip(':')[0:3] # take the value ':1:'
value = v.rstrip(':')[3] # take value 'A'
my_dict.setdefault(key,[]).append(value)
#convert dictionary to dataframe and transpose it
df = pd.DataFrame.from_dict(my_dict,orient='index').transpose()
df
:1: :2: :3: :4:
0 A B C G
1 D E F J
2 H None I None