如何在Python中使用循环（for/while）混合类似的代码？_Python_Pandas

如何在Python中使用循环（for/while）混合类似的代码？

python pandas

如何在Python中使用循环（for/while）混合类似的代码？,python,pandas,Python,Pandas,我重复了一些代码，其中只有一些数字在变化 df_h0 = df.copy() df_h0['hour']='00:00' df_h0['totalCount']=df.post_time_data.str.split('"00:00","postCount":"').str[1].str.split('","topic').str[0] df_h0 = df_h0.fillna(0) df_h1 = df.copy() df_h1['hour']='01:00' df_h1['totalCou

我重复了一些代码，其中只有一些数字在变化

df_h0 = df.copy()
df_h0['hour']='00:00'
df_h0['totalCount']=df.post_time_data.str.split('"00:00","postCount":"').str[1].str.split('","topic').str[0]
df_h0 = df_h0.fillna(0)

df_h1 = df.copy()
df_h1['hour']='01:00'
df_h1['totalCount']=df.post_time_data.str.split('"01:00","postCount":"').str[1].str.split('","topic').str[0]
df_h1 = df_h1.fillna(0)

df_h2 = df.copy()
df_h2['hour']='02:00'
df_h2['totalCount']=df.post_time_data.str.split('"02:00","postCount":"').str[1].str.split('","topic').str[0]
df_h2 = df_h2.fillna(0)

我想用一个循环来简化这段代码，但我不知道如何开始，因为我是Python新手。

您可以创建一个变量列表，并对其进行迭代，然后使用

string.format

方法

vars = [df_h0, df_h1, df_h2] 
x = 0

for var in vars:
  var = df.copy()
  var['hour']='0{0}:00'.format(x)
  var['totalCount']=df.post_time_data.str.split('0{0}:00", "postCount":'. format(x)).str[1].str.split('","topic').str[0]
  var = var.fillna(0)
  x += 1

如果您使用的是Python3.6+，那么也可以使用

f strings

而不是

.format（）

希望我没有遗漏任何东西，但如果我遗漏了，您可以通过声明另一个变量（如

）来实现我使用的相同逻辑，我将尝试展示该过程的总体外观，以便您将来可以自己解决这些问题。然而，这不是自动的——你需要考虑你每次都在做什么，以便写出你能写的最好的代码

步骤1：抓取一个你想要重复的代码的代表块，并确定发生变化的部分：步骤2：了解我们的输出将是一个值列表，而不是多个具有相关名称的独立变量。这将更容易继续工作：）

第三步：分析变化。我们有一个不同的小时字符串和一个不同的分隔符字符串；但是分隔符字符串始终具有相同的通用形式，它基于小时字符串。因此，如果我们有小时字符串，我们可以创建分隔符字符串。事实上，只有一条不同的信息——时间。我们将调整代码以反映：

hour = '00:00' # give the variable information a name
delimiter = f'"{hour}","postCount":"' # compute the derived information
# and then use those values in the rest of the code
df_h0 = df.copy()
df_h0['hour'] = hour
df_h0['totalCount']=df.post_time_data.str.split(delimiter).str[1].str.split('","topic').str[0]
df_h0 = df_h0.fillna(0)

步骤4：为了使整个代码更容易理解，我们将这个块放入它自己的函数中。这使我们能够为制作单个表的过程命名。我们使用函数的输入来提供我们在步骤3中描述的各种信息。有一件事是变化的，因此将有一个参数来表示这一点。但是，我们还需要提供我们在这里使用的数据上下文，

df

dataframe，以便函数能够访问它。我们总共有两个参数

def hourly_data(df, hour):
    # since 'hour' was provided, we don't define it here
    delimiter = f'"{hour}","postCount":"'
    # now we use a generic name inside the function.
    result = df.copy()
    result['hour'] = hour
    result['totalCount']=df.post_time_data.str.split(delimiter).str[1].str.split('","topic').str[0]
    # At the last step of the original process, we `return` the value
    # instead of simply assigning it.
    return result.fillna(0)

现在我们有了这样的代码，给定一个

'hour'

字符串，只需调用它就可以生成一个新的数据帧，例如：

df\u h0=hourly\u data（df，'00:00'）

第五步：再做一点分析。我们希望使用每个可能的小时值调用此函数，大概是从

'00:00'

到

'23:00'

。然而，这些字符串有一个明显的模式。如果我们只将小时数提供给

hourly_data

，并让它生成字符串，那么就更容易了

def hourly_data(df, hour):
    # Locally replace the integer hour value with the hour string.
    # The `:02` here is used to zero-pad and right-align the hour value
    # as two digits.
    hour = f'{hour:02}:00'
    delimiter = f'"{hour}","postCount":"'
    # The rest as before.
    result = df.copy()
    result['hour'] = hour
    result['totalCount']=df.post_time_data.str.split(delimiter).str[1].str.split('","topic').str[0]
    return result.fillna(0)

步骤6：现在我们准备在循环中使用此代码。在Python中，将一个输入列表“转换”为另一个输入列表的自然循环是列表理解。看起来是这样的：

hourly_dfs = [hourly_data(df, hour) for hour in range(24)]

这里是一个内置函数，它为我们提供所需的输入值序列

我们还可以使用

for

循环手动构建列表：

hourly_dfs = []
for hour in range(24):
    hourly_dfs.append(hourly_data(df, hour))

我们也可以在

for

循环的主体内完成工作（其他人可能会提供另一个答案并显示类似的代码）。但是通过先创建函数，我们得到了更容易理解的代码，这也允许我们使用列表理解。列表理解方法更简单，因为我们不必考虑从空开始的过程和

。追加每个元素，我们让Python构建一个列表，而不是告诉它怎么做。使用for/while的代码不会更简单，但会更长，而且可能工作得更慢，因为dataframe会按要求使用C/C++中的代码，这不是一个真正的Pandas问题，即使代码碰巧使用Pandas。
hourly_dfs = []
for hour in range(24):
    hourly_dfs.append(hourly_data(df, hour))