Python 如何从我创建的函数创建循环和新数据集？_Python_Pandas_Function_Loops_Dataframe

Python 如何从我创建的函数创建循环和新数据集？

python pandas function loops dataframe

Python 如何从我创建的函数创建循环和新数据集？,python,pandas,function,loops,dataframe,Python,Pandas,Function,Loops,Dataframe,我有以下房地产数据： neighborhood type_property type_negotiation price Smallville house rent 2000 Oakville apartment for sale 100000 King Bay house for sale 250000 ... 我创建了一个函数，通过你输入的邻居对这个大数据集进行排

我有以下房地产数据：

neighborhood  type_property  type_negotiation  price
Smallville       house           rent        2000
Oakville       apartment       for sale      100000
King Bay         house         for sale      250000
...

我创建了一个函数，通过你输入的邻居对这个大数据集进行排序，如果是待售房屋，然后返回这些房屋的第10和第90个百分位数和数量。我把它放在下面：

def foo(string):
    a = df[(df.type_negotiation == 'forsale')&(df.type_property == 'house')&(df.neighborhood == string)]
    b = pd.DataFrame([[a.price.quantile(0.1), a.price.quantile(0.9), len(a.index)]],
                     columns=('tenthpercentile', 'ninetiethpercentile', 'Quantity'))
    return b

print(foo('KingBay'))



  tenthpercentile  ninetiethpercentile  Quantity
0         250000.0             250000.0         1

我想写一个循环，对我拥有的社区列表执行此操作，然后在一个新的dat a帧中编译每个返回。看起来像这样：

          tenthpercentile  ninetiethpercentile  Quantity
King Bay         250000.0             250000.0         1
Smallville        99000.0             120000.0         8
Oakville          45000.0             160000.0         6

提前感谢。

对于数据帧，如果可以，最好避免显式循环，并使用

提供的优化方法。在您的例子中，您可以通过使用groupby
with，将所需的百分位数传递给参数percentiles
，来消除循环。然后，只需选择所需的列并适当地重命名它们：
new_df = (df.groupby('neighborhood')
          .describe(percentiles=[0.1,0.9])
          ['price'][['10%','90%','count']]
          .rename(columns={'count':'Quantity',
                           '10%':'tenthpercentile',
                           '90%':'ninetiethpercentile'}))

在您的案例中（因为每个社区只有一个示例）：
[编辑]：我刚才在你的函数中看到你只看了（df.type\u谈判=='for sale'）和（df.type\u财产=='house'）
。为此，只需添加一个loc
即可根据以下条件过滤数据帧：
new_df = (df.loc[(df.type_negotiation == 'for sale')
                 & (df.type_property == 'house')]
          .groupby('neighborhood')
              .describe(percentiles=[0.1,0.9])
              ['price'][['10%','90%','count']]
              .rename(columns={'count':'Quantity',
                               '10%':'tenthpercentile',
                               '90%':'ninetiethpercentile'}))

此外，如果您热衷于使用函数和循环（我不推荐这样做），您可以：
pd.concat([foo(i) for i in df.neighborhood.unique()])

pd.concat([foo(i) for i in df.neighborhood.unique()])