Python 在pandas dataframe中用于货币表示的数据类型是什么？_Python_Python 3.x_Pandas_Dataframe

Python 在pandas dataframe中用于货币表示的数据类型是什么？

python python-3.x pandas dataframe

Python 在pandas dataframe中用于货币表示的数据类型是什么？,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,所以我有一个pandas dataframe对象，它的column for money精度为两位小数，如“133.04”。没有小数点后3位或以上的数字，只有两位我的尝试：十进制模块我曾经尝试过使用十进制模块，但是当我尝试像这样重新采样时 gr_by_price = df['price'].resample(timeframe, how='ohlc') 我明白了在此之前，我检查了数据类型 print(type(df['price'][0])) <class 'decimal.Deci

所以我有一个pandas dataframe对象，它的column for money精度为两位小数，如“133.04”。没有小数点后3位或以上的数字，只有两位

我的尝试：十进制模块我曾经尝试过使用十进制模块，但是当我尝试像这样重新采样时

gr_by_price = df['price'].resample(timeframe, how='ohlc')

我明白了

在此之前，我检查了数据类型

print(type(df['price'][0]))
<class 'decimal.Decimal'>

从

我们遇到了一个类似的问题，最好的办法是将其乘以100，并将其表示为一个整数
（并将/100用于打印/外部选项）。

它将导致快速精确的计算（1+2==3而不是0.1+0.2！=0.3）

您需要区分内部值表示形式和表示方式（阅读更多信息）。正如您所说，您不需要其他类型的浮点数表示法，我建议继续使用常规的

float

进行内部表示法和数学（这是IEEE-754标准），只需添加这一行即可

pd.options.display.float_format = '{:6.2f}'.format

在脚本的开头。这将使所有打印的值自动四舍五入到第二位，而不会实际更改其值。（

pd

是熊猫的常见别名）

对于您的用例来说，十进制似乎是一种非常合理的表示形式。这里潜在的问题是pandas中的

ohlc

聚合器调用cython来提高速度，我假设cython不能接受小数。请看这里：

Insead，我认为最简单的方法是自己编写

ohlc

，这样它就可以对小数进行运算

In [89]: index = pd.date_range('1/1/2000', periods=9, freq='T')

In [90]: series = pd.Series(np.linspace(0, 2, 9), index=index)

In [91]: series.resample('3T').ohlc()
Out[91]:
                     open  high   low  close
2000-01-01 00:00:00  0.00  0.50  0.00   0.50
2000-01-01 00:03:00  0.75  1.25  0.75   1.25
2000-01-01 00:06:00  1.50  2.00  1.50   2.00

In [92]: decimal_series = pd.Series([Decimal(x) for x in np.linspace(0, 2, 9)], index=index)

In [93]: def ohlc(x):
    ...:     x = x[x.notnull()]
    ...:     if x.empty:
    ...:         return pd.Series({'open': np.nan, 'high': np.nan, 'low': np.nan, 'close': np.nan})
    ...:     return pd.Series({'open': x.iloc[0], 'high': x.max(), 'low': x.min(), 'close':x.iloc[-1]})
    ...:
In [107]: decimal_series.resample('3T').apply(ohlc).unstack()
Out[107]:
                    close  high   low  open
2000-01-01 00:00:00   0.5   0.5     0     0
2000-01-01 00:03:00  1.25  1.25  0.75  0.75
2000-01-01 00:06:00     2     2   1.5   1.5

我过去也遇到过这个问题，我最终使用的解决方案是将货币表示为其最低面值的倍数（即美元兑换1美分）。因此，类型将是

int

。如前所述，这种方法的优点是可以执行无损整数计算

Price (currency) = Multiplyer * Sub_unit

对于美元，价格的单位是美元，亚单位是一美分，使乘数为100

我想提到的另一个方面是，这在不同的货币中效果良好。例如，日元的最小面额是1日元，在这种情况下，乘数是1。印尼盾的最小面额是1000盾，因此乘数也可以是1。你只需要记住每种货币的乘数

事实上，您甚至可以创建一个自定义类来包装此转换，这可能是最方便的解决方案。

为此，您需要使用

np.float64

，不幸的是，只要不超过精度和限制，您就应该这样做fine@EdChum隐马尔可夫模型。。我不会让133.04变成133.05或133.03，对吗？所以我在重新采样之前将其转换为float64，重新采样并再次转换为十进制，对吗？这可能会发生，但通常不精确发生在较低的数字，但如果在最后转换为十进制，它应该会被剪裁this@EdChum非常感谢。我现在就这样做。这里是有趣的部分

>>d.Decimal（float（d.Decimal（“1.04”））Decimal（'1.0400000000000003552713678800500929355621337890625'）

大多数时候，您要做的是将数字存储为float，然后使用适当的格式来显示。有趣的事情发生在小数点后12或13位，所以在实践中这很少是一个问题。Decimal不是核心数据类型（如int或float），因此使用它可能会很麻烦。请注意，在核心数据类型之外，pandas将对象存储为对象。使用

info（）。
In [89]: index = pd.date_range('1/1/2000', periods=9, freq='T')

In [90]: series = pd.Series(np.linspace(0, 2, 9), index=index)

In [91]: series.resample('3T').ohlc()
Out[91]:
                     open  high   low  close
2000-01-01 00:00:00  0.00  0.50  0.00   0.50
2000-01-01 00:03:00  0.75  1.25  0.75   1.25
2000-01-01 00:06:00  1.50  2.00  1.50   2.00

In [92]: decimal_series = pd.Series([Decimal(x) for x in np.linspace(0, 2, 9)], index=index)

In [93]: def ohlc(x):
    ...:     x = x[x.notnull()]
    ...:     if x.empty:
    ...:         return pd.Series({'open': np.nan, 'high': np.nan, 'low': np.nan, 'close': np.nan})
    ...:     return pd.Series({'open': x.iloc[0], 'high': x.max(), 'low': x.min(), 'close':x.iloc[-1]})
    ...:
In [107]: decimal_series.resample('3T').apply(ohlc).unstack()
Out[107]:
                    close  high   low  open
2000-01-01 00:00:00   0.5   0.5     0     0
2000-01-01 00:03:00  1.25  1.25  0.75  0.75
2000-01-01 00:06:00     2     2   1.5   1.5

Price (currency) = Multiplyer * Sub_unit