Python 在组中创建顺序升序值列_Python_Pandas

Python 在组中创建顺序升序值列

python pandas

Python 在组中创建顺序升序值列,python,pandas,Python,Pandas,我有一个数据帧'df'，包括： col1 = datetime[64] col2 = object col3 = object col4 = object 我想按“col1”对数据帧进行排序。然后我想按“col2”分组。最后，我想在该分组中创建一个按“col2”排序并按“col1”排序的序数值（1,2,3）。如果“col2”分组中有4行，那么在这个新列中，这些行的值将是[1,2,3,4] 我知道熊猫有一个“等级（）”，我可以使用 df['newcol'] = df.groupby(['col2

我有一个数据帧'df'，包括：

col1 = datetime[64]
col2 = object
col3 = object
col4 = object

我想按“col1”对数据帧进行排序。然后我想按“col2”分组。最后，我想在该分组中创建一个按“col2”排序并按“col1”排序的序数值（1,2,3）。如果“col2”分组中有4行，那么在这个新列中，这些行的值将是[1,2,3,4]

我知道熊猫有一个“等级（）”，我可以使用

df['newcol'] = df.groupby(['col2'])['col1'].rank()

但这并没有给我原始的dataframe列，它的序数值只在分组中有[1,2,3]这样的数字？

试试这个：

>> df.sort_values(by='col1').groupby('col2')

这将首先按

col1

对DF进行排序，然后按

col2

进行分组。结果将是一个GroupBy对象

如果您还希望获得每个组中的行数，则可以尝试以下方法：

>> grouped = df.sort_values(by='col1').groupby('col2')
>> grouped.count()

我希望这有帮助

你是在试图实现这样的目标吗？如果没有样本数据和预期结果，很难判断

random.seed(0)
df = pd.DataFrame({col: [random.choice(list('abc')) for i in range(10)] for col in list('ABC')})
df['timestamp'] = pd.date_range('2016-1-1', periods=len(df))

df.sort_values('timestamp', inplace=True)
df['rank'] = \
    df.groupby('A')['B'].transform(lambda group: group.astype('category').cat.codes + 1)

>>> df
   A  B  C  timestamp rank
0  c  c  a 2016-01-01    2
1  c  b  c 2016-01-02    1
2  b  a  c 2016-01-03    1
3  a  c  c 2016-01-04    1
4  b  b  b 2016-01-05    2
5  b  a  a 2016-01-06    1
6  c  c  b 2016-01-07    2
7  a  c  b 2016-01-08    1
8  b  c  c 2016-01-09    3
9  b  c  c 2016-01-10    3

有一种pandas

groupby

方法，它完全符合OP的要求：

df.sort_values("col1", inplace = True)
df["rank"] = df.groupby("col2").cumcount() + 1

不过，在此处使用

多索引可能更有用：
df.set_index(["col1", "col2"], inplace = True)
df["rank"] = df.groupby(level = "col2").cumcount() + 1

为了使事情更美观（行按“col2”和“col1”排序）：
尝试df['newcol']=df.groupby（['col2']）.transform（lambda g:g['col1'].rank（））
df.sort_values(by=["col2", "col1"], inplace = True)