Python 使用熊猫阅读csv。为土壤类型重新指定值1至40，为森林覆盖类型重新指定值1至7_Python_Csv_Pandas

Python 使用熊猫阅读csv。为土壤类型重新指定值1至40，为森林覆盖类型重新指定值1至7

python csv pandas

Python 使用熊猫阅读csv。为土壤类型重新指定值1至40，为森林覆盖类型重新指定值1至7,python,csv,pandas,Python,Csv,Pandas,在csv输入文件中，下面有56列。示例数据如下所示。请注意我的格式 Id,Elevation,Aspect,Slope,Horizontal_Distance_To_Hydrology,Vertical_Distance_To_Hydrology,Horizontal_Distance_To_Roadways,Hillshade_9am,Hillshade_Noon,Hillshade_3pm,Horizontal_Distance_To_Fire_Points,Wilderness_Area1

在csv输入文件中，下面有56列。示例数据如下所示。请注意我的格式

Id,Elevation,Aspect,Slope,Horizontal_Distance_To_Hydrology,Vertical_Distance_To_Hydrology,Horizontal_Distance_To_Roadways,Hillshade_9am,Hillshade_Noon,Hillshade_3pm,Horizontal_Distance_To_Fire_Points,Wilderness_Area1,Wilderness_Area2,Wilderness_Area3,Wilderness_Area4,Soil_Type1,Soil_Type2,Soil_Type3,Soil_Type4,Soil_Type5,Soil_Type6,Soil_Type7,Soil_Type8,Soil_Type9,Soil_Type10,Soil_Type11,Soil_Type12,Soil_Type13,Soil_Type14,Soil_Type15,Soil_Type16,Soil_Type17,Soil_Type18,Soil_Type19,Soil_Type20,Soil_Type21,Soil_Type22,Soil_Type23,Soil_Type24,Soil_Type25,Soil_Type26,Soil_Type27,Soil_Type28,Soil_Type29,Soil_Type30,Soil_Type31,Soil_Type32,Soil_Type33,Soil_Type34,Soil_Type35,Soil_Type36,Soil_Type37,Soil_Type38,Soil_Type39,Soil_Type40,Cover_Type
1,2596,51,3,258,0,510,221,232,148,6279,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
2,2590,56,2,212,-6,390,220,235,151,6225,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
3,2804,139,9,268,65,3180,234,238,135,6121,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
4,2785,155,18,242,118,3090,238,238,122,6211,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2
5,2595,45,2,153,-1,391,220,234,150,6172,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
6,2579,132,6,300,-15,67,230,237,140,6031,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2
7,2606,45,7,270,5,633,222,225,138,6256,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
8,2605,49,4,234,7,573,222,230,144,6228,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
9,2617,45,9,240,56,666,223,221,133,6244,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5
10,2612,59,10,247,11,636,228,219,124,6230,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,5

我需要转换这些数据。以下是要求。 -删除多个具有二进制值（0或1）的列，并为新列指定值的范围。对于荒野阿瑞斯，它是从1到4。对于土壤类型，它是从1到40

将立柱从荒野_区域1移到荒野_区域4。在U区域中添加一列。根据输入行将1分配到4。示例-之前，上面示例输入中的第一行
- 荒野面积1=1，现在应该是荒野面积1
- 荒野面积2=1，现在应该是荒野面积2
- 荒野面积3=1，现在应该是荒野面积3
- 荒野面积4=1，现在应该是荒野面积4
- 移除土壤类型1至土壤类型40的柱。添加单柱土壤类型。根据输入行将1分配到40。示例-之前，上面示例输入中的第一行
- 土壤类型1=1，现在应该是土壤类型1
- 土壤类型2=1，现在应该是土壤类型2
- 土壤类型3=1，现在应为土壤类型3
- 土壤类型4=1，现在应该是土壤类型4

我使用了下面的代码，但在我的数据框中仍然得到了40种土壤类型。我需要从df中删除这些列。我该如何做上述所有工作

df = pandas.read_csv(ifname)
df['Soil'] = 0
for i in range(1,41):
    df['Soil'] = df['Soil'] + i*df['Soil_Type'+str(i)]

print(df)

下面是我需要的例子

Id,Elevation,Aspect,Slope,Horizontal_Distance_To_Hydrology,Vertical_Distance_To_Hydrology,Horizontal_Distance_To_Roadways,Hillshade_9am,Hillshade_Noon,Hillshade_3pm,Horizontal_Distance_To_Fire_Points,Cover_Type,Soil,Wilderness_Area
1,2596,51,3,258,0,510,221,232,148,6279,5,29,1
2,2590,56,2,212,-6,390,220,235,151,6225,5,29,1
3,2804,139,9,268,65,3180,234,238,135,6121,2,12,1
4,2785,155,18,242,118,3090,238,238,122,6211,2,30,1
5,2595,45,2,153,-1,391,220,234,150,6172,5,29,1
6,2579,132,6,300,-15,67,230,237,140,6031,2,29,1
7,2606,45,7,270,5,633,222,225,138,6256,5,29,1
8,2605,49,4,234,7,573,222,230,144,6228,5,29,1
9,2617,45,9,240,56,666,223,221,133,6244,5,29,1
10,2612,59,10,247,11,636,228,219,124,6230,5,29,1

您几乎成功了，分配值后只需删除列：

In [158]:

soil_type_cols = [col for col in df if 'Soil_Type' in col]
wilderness_cols = [col for col in df if 'Wilderness_Area' in col]

for i in range(1,41):
    df['Soil'] = i*df['Soil_Type'+str(i)]

for i in range(1,5):
    df['Wilderness_Area'] = i*df['Wilderness_Area'+str(i)]

df = df.drop(soil_type_cols+wilderness_cols, axis=1)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 14 columns):
Id                                    10 non-null int64
Elevation                             10 non-null int64
Aspect                                10 non-null int64
Slope                                 10 non-null int64
Horizontal_Distance_To_Hydrology      10 non-null int64
Vertical_Distance_To_Hydrology        10 non-null int64
Horizontal_Distance_To_Roadways       10 non-null int64
Hillshade_9am                         10 non-null int64
Hillshade_Noon                        10 non-null int64
Hillshade_3pm                         10 non-null int64
Horizontal_Distance_To_Fire_Points    10 non-null int64
Cover_Type                            10 non-null int64
Soil                                  10 non-null int64
Wilderness_Area                       10 non-null int64
dtypes: int64(14)
memory usage: 1.2 KB

[158]中的


soil_type_cols=[如果col中的“soil_type”，则df中col的col为col]
荒野面积=[如果“荒野面积”位于山谷中，则为df中山谷的山谷]
对于范围（1,41）内的i：
df['Soil']=i*df['Soil_Type'+str（i）]
对于范围（1,5）内的i：
df[‘荒野地区’]=i*df[‘荒野地区’+str（i）]
df=落差（土壤类型+荒野类型，轴=1）
df.info（）
INT64索引：10个条目，0到9
数据列（共14列）：
Id 10非空int64
标高10非空int64
方面10非空int64
斜率10非空int64
水平\u距离\u到\u 10非空int64
垂直距离\u到\u 10非空int64
水平距离到道路10非零int64
Hillshade_上午9点10非空int64
Hillshade_正午10非空int64
Hillshade_3pm 10非空int64
水平距离到火点10非空int64
封面类型10非空int64
Soil 10非空int64
荒野10区非空int64
数据类型：int64（14）
内存使用：1.2 KB

很酷。我把它添加到我的代码中。我得到了以下输出<代码>、Id、高程、坡向、坡度、水平距离、水文、垂直距离‌_水文，水平距离，山体，上午9点，中午，山体‌_下午3点，到火点的水平距离，覆盖类型，土壤，荒野区域**0**，12596,51,3258,05102212321486279,5,29,1**1**，22590,56,2212，-63902202351516225,5,29,1**2**，32804139,9268,653180232381356121,2,12,1我不理解熊猫添加的第一列（粗体）。你能解释一下吗？是否可以删除它。看起来这是熊猫的一个特征。我用了

df.to_csv（of name，index=False）

现在它可以工作了。我回答了你的问题吗？如果是这样，您可以接受我的答案，我的答案左上角将有一个空勾号