Python 基于数据帧中的行索引号插入值_Python_Pandas_Dataframe

Python 基于数据帧中的行索引号插入值

python pandas dataframe

Python 基于数据帧中的行索引号插入值,python,pandas,dataframe,Python,Pandas,Dataframe,我需要根据数据帧的行索引将值插入到列中 import pandas as pd df=pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD')) df['ticker']='na' df 在上面的示例数据框中，前25%记录的ticker列必须具有值“$”，接下来25%的记录必须具有值“$”，依此类推我试图获得数据帧的长度，并计算其25,50,75%，然后每次访问一行，并根据行索引为“ticker”赋值

我需要根据数据帧的行索引将值插入到列中

import pandas as pd
df=pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
df['ticker']='na'
df

在上面的示例数据框中，前25%记录的ticker列必须具有值“$”，接下来25%的记录必须具有值“$”，依此类推

我试图获得数据帧的长度，并计算其25,50,75%，然后每次访问一行，并根据行索引为“ticker”赋值

total_row_count=len(df)
row_25 = int(total_row_count * .25)
row_50 = int(total_row_count * .5)
row_75=int(total_row_count*.75)

if ((row.index >=0) and (row.index<=row_25)):
    return"$"
elif ((row.index > row_25) and (row.index<=row_50)):
    return"$$"
elif ((row.index > row_50) and (row.index<=row_75)):
    return"$$$"
elif (row.index > row_75):
    return"$$$$"

total_row_count=len（df）
第25行=整数（总计第25行）
第50行=整数（总计第5行）
第75行=整数（总计第75行）
如果（（row.index>=0）和（row.index row_25）以及（row.index row_50）和（row.index row_75）：
返回“$$$”

但是我无法获取行索引。请告诉我是否有其他方法分配这些值

您可以设置一些np.where语句来处理此问题。请尝试以下操作：

import numpy as np
...
df['ticker'] = np.where(df.index < row_25, "$", df['ticker'])
df['ticker'] = np.where(row_25 <= df.index < row_50, "$$", df['ticker'])
df['ticker'] = np.where(row_50 <= df.index < row_75, "$$$", df['ticker'])
df['ticker'] = np.where(row_75 <= df.index, "$$$$", df['ticker'])

将numpy导入为np
...
df['ticker']=np.where（df.indexdf['ticker']=np.where（row_25您可以设置几个np.where语句来处理此问题。请尝试以下操作：
import numpy as np
...
df['ticker'] = np.where(df.index < row_25, "$", df['ticker'])
df['ticker'] = np.where(row_25 <= df.index < row_50, "$$", df['ticker'])
df['ticker'] = np.where(row_50 <= df.index < row_75, "$$$", df['ticker'])
df['ticker'] = np.where(row_75 <= df.index, "$$$$", df['ticker'])

将numpy导入为np
...
df['ticker']=np.where（df.indexdf['ticker']=np.where（row_25我认为cut
可以解决这个问题
df['ticker']=pd.cut(np.arange(len(df))/len(df), [-np.inf,0.25,0.5,0.75,1], labels=["$","$$",'$$$','$$$$'],right=True)
df
Out[35]: 
     A   B   C   D ticker
0   63  51  19  33      $
1   12  80  57   1      $
2   53  27  62  26      $
3   97  43  31  80     $$
4   91  22  92  11     $$
5   39  70  82  26     $$
6   32  62  17  75    $$$
7    5  59  79  72    $$$
8   75   4  47   4    $$$
9   43   5  45  66   $$$$
10  29   9  74  94   $$$$

我认为cut
可以解决这个问题
df['ticker']=pd.cut(np.arange(len(df))/len(df), [-np.inf,0.25,0.5,0.75,1], labels=["$","$$",'$$$','$$$$'],right=True)
df
Out[35]: 
     A   B   C   D ticker
0   63  51  19  33      $
1   12  80  57   1      $
2   53  27  62  26      $
3   97  43  31  80     $$
4   91  22  92  11     $$
5   39  70  82  26     $$
6   32  62  17  75    $$$
7    5  59  79  72    $$$
8   75   4  47   4    $$$
9   43   5  45  66   $$$$
10  29   9  74  94   $$$$

我喜欢使用np。选择执行此类任务，因为我发现语法直观易读：
# Set up your conditions:
conds = [(df.index >= 0) & (df.index <= row_25),
         (df.index > row_25) & (df.index<=row_50),
         (df.index > row_50) & (df.index<=row_75),
         (df.index > row_75)]

# Set up your target values (in the same order as your conditions)
choices = ['$', '$$', '$$$', '$$$$']

# Assign df['ticker']
df['ticker'] = np.select(conds, choices)

我喜欢使用np。选择执行此类任务，因为我发现语法直观易读：
# Set up your conditions:
conds = [(df.index >= 0) & (df.index <= row_25),
         (df.index > row_25) & (df.index<=row_50),
         (df.index > row_50) & (df.index<=row_75),
         (df.index > row_75)]

# Set up your target values (in the same order as your conditions)
choices = ['$', '$$', '$$$', '$$$$']

# Assign df['ticker']
df['ticker'] = np.select(conds, choices)

这是一个使用.loc
访问器的显式解决方案
import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
n = len(df.index)

df['ticker'] = 'na'
df.loc[df.index <= n/4, 'ticker'] = '$'
df.loc[(n/4 < df.index) & (df.index <= n/2), 'ticker'] = '$$'
df.loc[(n/2 < df.index) & (df.index <= n*3/4), 'ticker'] = '$$$'
df.loc[df.index > n*3/4, 'ticker'] = '$$$$'

#      A   B   C   D ticker
# 0   47  64   7  46      $
# 1   53  55  75   3      $
# 2   93  95  28  47      $
# 3   35  88  16   7     $$
# 4   99  66  88  84     $$
# 5   75   2  72  90     $$
# 6    6  53  36  92    $$$
# 7   83  58  54  67    $$$
# 8   49  83  46  54    $$$
# 9   69   9  96  73   $$$$
# 10  84  42  11  83   $$$$

将熊猫作为pd导入
df=pd.DataFrame（np.random.randint（0100，size=（11,4）），columns=list（'ABCD'））
n=长度（测向指数）
df['ticker']='na'
df.loc[df.index这是一个使用.loc
访问器的显式解决方案
import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
n = len(df.index)

df['ticker'] = 'na'
df.loc[df.index <= n/4, 'ticker'] = '$'
df.loc[(n/4 < df.index) & (df.index <= n/2), 'ticker'] = '$$'
df.loc[(n/2 < df.index) & (df.index <= n*3/4), 'ticker'] = '$$$'
df.loc[df.index > n*3/4, 'ticker'] = '$$$$'

#      A   B   C   D ticker
# 0   47  64   7  46      $
# 1   53  55  75   3      $
# 2   93  95  28  47      $
# 3   35  88  16   7     $$
# 4   99  66  88  84     $$
# 5   75   2  72  90     $$
# 6    6  53  36  92    $$$
# 7   83  58  54  67    $$$
# 8   49  83  46  54    $$$
# 9   69   9  96  73   $$$$
# 10  84  42  11  83   $$$$

将熊猫作为pd导入
df=pd.DataFrame（np.random.randint（0100，size=（11,4）），columns=list（'ABCD'））
n=长度（测向指数）
df['ticker']='na'
df.loc[df.index我不确定我遗漏了什么，但是当我运行代码时，它会为ticker列中的所有行返回“$”。@在我这边，它工作得很好，你介意粘贴你在这里使用的代码吗？>导入熊猫作为pd导入numpy作为np df=pd.DataFrame（np.random.randint（0100，size=（11，4）），columns=list（'ABCD'））df['ticker']=pd.cut（np.arange（len（df））/len（df），[-np.inf，0.25,0.5,0.75,1]，labels=[“$”，“$$”，“$$$”，“$$$”，“$$$$”]，right=True）df
我不确定我缺少了什么，但当我运行代码时，它会返回”$“对于ticker列中的所有行。@在我这方面，它工作得很好，您介意粘贴您在这里使用的代码吗？>import pandas as pd import numpy as np df=pd.DataFrame（np.random.randint（0100，size=（11，4）），columns=list（'ABCD'））df['ticker']=pd.cut（np.arange（len（df））/len（df），[-np.inf，0.25,0.5,0.75,1]，labels=”[“$”，“$”，“$”，“$”，“$”，“$”，“$”，$”，$”，$”，right=True）df
最后2条记录中不会填充“$$$”。知道它为什么不会填充吗？请尝试：df['ticker']=np。选择（条件，选项，默认值='test'）
，如果最后两条记录中填充了值test
，则表示这些行中没有满足提供的条件。否则，我不确定…您的解决方案是否有效。我不确定为什么它不会显示在我的df中。当我将其保存为csv时，我能够看到“$$$”。谢谢saculI，我正在尝试这一点，因为我认为这会很有用解决了我的问题，但我得到一个错误，说“```行`` 6'没有定义“``````（在本例中是第``行）。您是否知道解决这个问题的方法？最后两条记录中不会填充“$$$$$”。知道为什么它不会填充吗？请尝试：df['ticker']=np.select（conds，choices，default='test'））
，如果最后两条记录中填充了值test
，则表示这些行中没有满足提供的条件。否则，我不确定…您的解决方案是否有效。我不确定为什么它不会显示在我的df中。当我将其保存为csv时，我能够看到“$$$”。谢谢saculI，我正在尝试这一点，因为我认为这会很有用解决了我的问题，但我得到一个错误，说“第6行”没有定义“第25行”（在本例中是第25行）。你知道解决这个问题的方法吗？“$$$”不会填充任何关于我缺少什么的想法？这很奇怪，当我尝试print（df）时
我根据我的帖子看到了输出。你的解决方案成功了。我不知道为什么它不会显示在我的df中。当我将其保存为csv时，我可以看到“$$$”。谢谢@jpp@sow，没问题。如果它解决了您的问题，请随意接受（在左边打勾）。“$$$$”不会填充任何关于我缺少什么的想法？这很奇怪，当我尝试print（df）时
我根据我的帖子看到了输出。你的解决方案成功了。我不知道为什么它不会显示在我的df中。当我将其保存为csv时，我可以看到“$$$”。谢谢@jpp@sow，没问题。如果它解决了您的问题，请随意接受（勾选左侧）。