使用Python需要将数据从多列转换为单列,并重复A列
尽管如此,我对python还是相当陌生,在这方面需要帮助: 我拥有的数据是csv格式,如下所示: Month YEAR AZ-Phoenix CA-Los Angeles CA-San Diego CA-San Francisco CO-Denver DC-Washington January 1987 59.33 54.67 46.61 50.20 February 1987 59.65 54.89 46.87 49.96 64.77 月份年份AZ凤凰城加利福尼亚州洛杉矶加利福尼亚州圣地亚哥加利福尼亚州旧金山科罗拉多州丹佛华盛顿特区 1987年1月59.3354.6746.6150.20 1987年2月59.65 54.89 46.87 49.96 64.77 这需要通过增加第1列n来合并并显示在第2列和第3列中。。时代 输出应为: Month YEAR January 1987 AZ-Phoenix January 1987 CA-Los Angeles 59.33 January 1987 CA-San Diego 54.67 January 1987 CA-San Francisco 46.61 January 1987 CO-Denver 50.20 月年 1987年1月亚利桑那州凤凰城 1987年1月加利福尼亚州洛杉矶59.33 1987年1月加利福尼亚州圣地亚哥54.67 1987年1月加利福尼亚州旧金山46.61 1987年1月,科罗拉多州丹佛50.20 如何在csv阅读器中实现这一点?与分隔符使用Python需要将数据从多列转换为单列,并重复A列,python,csv,pandas,data-manipulation,Python,Csv,Pandas,Data Manipulation,尽管如此,我对python还是相当陌生,在这方面需要帮助: 我拥有的数据是csv格式,如下所示: Month YEAR AZ-Phoenix CA-Los Angeles CA-San Diego CA-San Francisco CO-Denver DC-Washington January 1987 59.33 54.67 46.61 50.20 February 1987
选项卡一起使用-\t
或者如果分隔符为2或更多空格
使用piRSquared的解决方案:
import pandas as pd
df = pd.read_csv(sep='\t')
我认为你需要:
df = df.set_index('YEAR').stack(dropna=False).reset_index()
df.columns = ['YEAR','A','B']
print (df)
YEAR A B
0 January 1987 AZ-Phoenix 59.33
1 January 1987 CA-Los Angeles 54.67
2 January 1987 CA-San 46.61
3 January 1987 Diego 50.20
4 January 1987 CA-San Francisco NaN
5 January 1987 CO-Denver NaN
6 January 1987 DC-Washington NaN
7 February 1987 AZ-Phoenix 59.65
8 February 1987 CA-Los Angeles 54.89
9 February 1987 CA-San 46.87
10 February 1987 Diego 49.96
11 February 1987 CA-San Francisco 64.77
12 February 1987 CO-Denver NaN
13 February 1987 DC-Washington NaN
#if need remove rows with NaN
df = df.set_index('YEAR').stack().reset_index()
df.columns = ['YEAR','A','B']
print (df)
YEAR A B
0 January 1987 AZ-Phoenix 59.33
1 January 1987 CA-Los Angeles 54.67
2 January 1987 CA-San 46.61
3 January 1987 Diego 50.20
4 February 1987 AZ-Phoenix 59.65
5 February 1987 CA-Los Angeles 54.89
6 February 1987 CA-San 46.87
7 February 1987 Diego 49.96
8 February 1987 CA-San Francisco 64.77
另一个解决方案包括:
选项1
使用
选项2
使用numpy
工具重建
pd.DataFrame(dict(
YEAR=df.YEAR.values.repeat(len(df.columns) - 1),
B=df.drop('YEAR', 1).values.ravel(),
A=np.tile(df.columns.difference(['YEAR']).values, len(df)),
))[['YEAR', 'A', 'B']]
YEAR variable value
0 January 1987 AZ-Phoenix 59.33
1 February 1987 AZ-Phoenix 59.65
2 January 1987 CA-Los Angeles 54.67
3 February 1987 CA-Los Angeles 54.89
4 January 1987 CA-San Diego 46.61
5 February 1987 CA-San Diego 46.87
6 January 1987 CA-San Francisco 50.20
7 February 1987 CA-San Francisco 49.96
8 January 1987 CO-Denver NaN
9 February 1987 CO-Denver 64.77
10 January 1987 DC-Washington NaN
11 February 1987 DC-Washington NaN
设置
df是否需要导入声明?这能用在csv阅读器上吗?是的,没错。给我一点时间,我很高兴能帮助你,祝你愉快!现在我只在打电话。我认为最好的办法是用样本数据、期望的输出和您的尝试来创建新问题。祝你好运“一月”和“1987”是两个不同的列,当使用第一个代码时,它正在回收第一行。i、 1月e日不仅显示了1987年的轴心,亚利桑那州凤凰城,59.33。如何确保也考虑了一月。一月和1987是两个不同的列,当使用第一个代码时,它是在挽救第一行。i、 1月e日不仅显示了1987年的轴心,亚利桑那州凤凰城,59.33。如何确保一月也被考虑在内
pd.melt(df, 'YEAR')
YEAR variable value
0 January 1987 AZ-Phoenix 59.33
1 February 1987 AZ-Phoenix 59.65
2 January 1987 CA-Los Angeles 54.67
3 February 1987 CA-Los Angeles 54.89
4 January 1987 CA-San Diego 46.61
5 February 1987 CA-San Diego 46.87
6 January 1987 CA-San Francisco 50.20
7 February 1987 CA-San Francisco 49.96
8 January 1987 CO-Denver NaN
9 February 1987 CO-Denver 64.77
10 January 1987 DC-Washington NaN
11 February 1987 DC-Washington NaN
pd.DataFrame(dict(
YEAR=df.YEAR.values.repeat(len(df.columns) - 1),
B=df.drop('YEAR', 1).values.ravel(),
A=np.tile(df.columns.difference(['YEAR']).values, len(df)),
))[['YEAR', 'A', 'B']]
YEAR variable value
0 January 1987 AZ-Phoenix 59.33
1 February 1987 AZ-Phoenix 59.65
2 January 1987 CA-Los Angeles 54.67
3 February 1987 CA-Los Angeles 54.89
4 January 1987 CA-San Diego 46.61
5 February 1987 CA-San Diego 46.87
6 January 1987 CA-San Francisco 50.20
7 February 1987 CA-San Francisco 49.96
8 January 1987 CO-Denver NaN
9 February 1987 CO-Denver 64.77
10 January 1987 DC-Washington NaN
11 February 1987 DC-Washington NaN
df = pd.read_csv(sep='\s{2,}', engine='python')