Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/350.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python需要将数据从多列转换为单列,并重复A列_Python_Csv_Pandas_Data Manipulation - Fatal编程技术网

使用Python需要将数据从多列转换为单列,并重复A列

使用Python需要将数据从多列转换为单列,并重复A列,python,csv,pandas,data-manipulation,Python,Csv,Pandas,Data Manipulation,尽管如此,我对python还是相当陌生,在这方面需要帮助: 我拥有的数据是csv格式,如下所示: Month YEAR AZ-Phoenix CA-Los Angeles CA-San Diego CA-San Francisco CO-Denver DC-Washington January 1987 59.33 54.67 46.61 50.20 February 1987

尽管如此,我对python还是相当陌生,在这方面需要帮助:

我拥有的数据是csv格式,如下所示:

Month YEAR AZ-Phoenix CA-Los Angeles CA-San Diego CA-San Francisco CO-Denver DC-Washington January 1987 59.33 54.67 46.61 50.20 February 1987 59.65 54.89 46.87 49.96 64.77 月份年份AZ凤凰城加利福尼亚州洛杉矶加利福尼亚州圣地亚哥加利福尼亚州旧金山科罗拉多州丹佛华盛顿特区 1987年1月59.3354.6746.6150.20 1987年2月59.65 54.89 46.87 49.96 64.77 这需要通过增加第1列n来合并并显示在第2列和第3列中。。时代

输出应为:

Month YEAR January 1987 AZ-Phoenix January 1987 CA-Los Angeles 59.33 January 1987 CA-San Diego 54.67 January 1987 CA-San Francisco 46.61 January 1987 CO-Denver 50.20 月年 1987年1月亚利桑那州凤凰城 1987年1月加利福尼亚州洛杉矶59.33 1987年1月加利福尼亚州圣地亚哥54.67 1987年1月加利福尼亚州旧金山46.61 1987年1月,科罗拉多州丹佛50.20 如何在csv阅读器中实现这一点?

与分隔符
选项卡一起使用
-
\t
或者如果分隔符为
2或更多空格
使用
piRSquared的解决方案:

import pandas as pd

df = pd.read_csv(sep='\t') 
我认为你需要:

df = df.set_index('YEAR').stack(dropna=False).reset_index()
df.columns = ['YEAR','A','B']
print (df)
             YEAR                 A      B
0    January 1987        AZ-Phoenix  59.33
1    January 1987    CA-Los Angeles  54.67
2    January 1987            CA-San  46.61
3    January 1987             Diego  50.20
4    January 1987  CA-San Francisco    NaN
5    January 1987         CO-Denver    NaN
6    January 1987     DC-Washington    NaN
7   February 1987        AZ-Phoenix  59.65
8   February 1987    CA-Los Angeles  54.89
9   February 1987            CA-San  46.87
10  February 1987             Diego  49.96
11  February 1987  CA-San Francisco  64.77
12  February 1987         CO-Denver    NaN
13  February 1987     DC-Washington    NaN

#if need remove rows with NaN
df = df.set_index('YEAR').stack().reset_index()
df.columns = ['YEAR','A','B']
print (df)
            YEAR                 A      B
0   January 1987        AZ-Phoenix  59.33
1   January 1987    CA-Los Angeles  54.67
2   January 1987            CA-San  46.61
3   January 1987             Diego  50.20
4  February 1987        AZ-Phoenix  59.65
5  February 1987    CA-Los Angeles  54.89
6  February 1987            CA-San  46.87
7  February 1987             Diego  49.96
8  February 1987  CA-San Francisco  64.77

另一个解决方案包括:

选项1
使用

选项2
使用
numpy
工具重建

pd.DataFrame(dict(
        YEAR=df.YEAR.values.repeat(len(df.columns) - 1),
        B=df.drop('YEAR', 1).values.ravel(),
        A=np.tile(df.columns.difference(['YEAR']).values, len(df)),
    ))[['YEAR', 'A', 'B']]


             YEAR          variable  value
0    January 1987        AZ-Phoenix  59.33
1   February 1987        AZ-Phoenix  59.65
2    January 1987    CA-Los Angeles  54.67
3   February 1987    CA-Los Angeles  54.89
4    January 1987      CA-San Diego  46.61
5   February 1987      CA-San Diego  46.87
6    January 1987  CA-San Francisco  50.20
7   February 1987  CA-San Francisco  49.96
8    January 1987         CO-Denver    NaN
9   February 1987         CO-Denver  64.77
10   January 1987     DC-Washington    NaN
11  February 1987     DC-Washington    NaN
设置


df是否需要导入声明?这能用在csv阅读器上吗?是的,没错。给我一点时间,我很高兴能帮助你,祝你愉快!现在我只在打电话。我认为最好的办法是用样本数据、期望的输出和您的尝试来创建新问题。祝你好运“一月”和“1987”是两个不同的列,当使用第一个代码时,它正在回收第一行。i、 1月e日不仅显示了1987年的轴心,亚利桑那州凤凰城,59.33。如何确保也考虑了一月。一月和1987是两个不同的列,当使用第一个代码时,它是在挽救第一行。i、 1月e日不仅显示了1987年的轴心,亚利桑那州凤凰城,59.33。如何确保一月也被考虑在内
pd.melt(df, 'YEAR')

             YEAR          variable  value
0    January 1987        AZ-Phoenix  59.33
1   February 1987        AZ-Phoenix  59.65
2    January 1987    CA-Los Angeles  54.67
3   February 1987    CA-Los Angeles  54.89
4    January 1987      CA-San Diego  46.61
5   February 1987      CA-San Diego  46.87
6    January 1987  CA-San Francisco  50.20
7   February 1987  CA-San Francisco  49.96
8    January 1987         CO-Denver    NaN
9   February 1987         CO-Denver  64.77
10   January 1987     DC-Washington    NaN
11  February 1987     DC-Washington    NaN
pd.DataFrame(dict(
        YEAR=df.YEAR.values.repeat(len(df.columns) - 1),
        B=df.drop('YEAR', 1).values.ravel(),
        A=np.tile(df.columns.difference(['YEAR']).values, len(df)),
    ))[['YEAR', 'A', 'B']]


             YEAR          variable  value
0    January 1987        AZ-Phoenix  59.33
1   February 1987        AZ-Phoenix  59.65
2    January 1987    CA-Los Angeles  54.67
3   February 1987    CA-Los Angeles  54.89
4    January 1987      CA-San Diego  46.61
5   February 1987      CA-San Diego  46.87
6    January 1987  CA-San Francisco  50.20
7   February 1987  CA-San Francisco  49.96
8    January 1987         CO-Denver    NaN
9   February 1987         CO-Denver  64.77
10   January 1987     DC-Washington    NaN
11  February 1987     DC-Washington    NaN
df = pd.read_csv(sep='\s{2,}', engine='python')