Python Xarray:使同一数据集中的两个数据数组使用相同的坐标系

Python Xarray:使同一数据集中的两个数据数组使用相同的坐标系,python,python-xarray,Python,Python Xarray,我有一个ArviZ推断数据后跟踪,这是一个XArray数据集 在这里,我的两个随机变量,a_mu_org和b_mu_org的后验轨迹是数据数组。它们的坐标是: a_mu_组织:(链,绘制,a_mu_组织),长度分别为1,2000,15) b_mu_组织:(链,绘图,b_mu_组织),长度分别为1,2000,15) 从语义上讲,a_mu_org和b_mu_org实际上应该由15个生物体组成的单一分类坐标系索引,而不是单独的索引 为了更清楚一点,这里是完整的数据集字符串repr: <xar

我有一个ArviZ推断数据后跟踪,这是一个XArray数据集

在这里,我的两个随机变量,
a_mu_org
b_mu_org
的后验轨迹是数据数组。它们的坐标是:

  • a_mu_组织
    :(
    绘制
    a_mu_组织
    ),长度分别为1,2000,15)
  • b_mu_组织
    :(
    绘图
    b_mu_组织
    ),长度分别为1,2000,15)
从语义上讲,
a_mu_org
b_mu_org
实际上应该由15个生物体组成的单一分类坐标系索引,而不是单独的索引

为了更清楚一点,这里是完整的数据集字符串repr:

<xarray.Dataset>
Dimensions:             (L_dim_0: 34281, a_dim_0: 456260, a_prot_shift_dim_0: 34281, b_dim_0: 456260, b_mu_org_dim_0: 15, b_prot_shift_dim_0: 34281, chain: 1, draw: 2000, organism: 15, sigma_dim_0: 34281, t50_org_dim_0: 15, t50_prot_dim_0: 39957)
Coordinates:
  * chain               (chain) int64 0
  * draw                (draw) int64 0 1 2 3 4 5 ... 1995 1996 1997 1998 1999
  * a_prot_shift_dim_0  (a_prot_shift_dim_0) object 'A0A023PXQ4_YMR173W-A' ... 'Z4YNA9_AB124611'
  * b_prot_shift_dim_0  (b_prot_shift_dim_0) object 'A0A023PXQ4_YMR173W-A' ... 'Z4YNA9_AB124611'
  * L_dim_0             (L_dim_0) object 'A0A023PXQ4_YMR173W-A' ... 'Z4YNA9_AB124611'
    a_mu_org_dim_0      (organism) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
  * a_dim_0             (a_dim_0) object 'ytzI' 'mtlF' ... 'atpG2' 'atpB2'
  * b_mu_org_dim_0      (b_mu_org_dim_0) int64 0 1 2 3 4 5 ... 9 10 11 12 13 14
  * b_dim_0             (b_dim_0) object 'ytzI' 'mtlF' ... 'atpG2' 'atpB2'
  * t50_prot_dim_0      (t50_prot_dim_0) <U65 'Bacillus subtilis_168_lysate_R1-C0H3Q1_ytzI' ... 'Oleispira antarctica_RB-8_lysate_R1-R4YVF0_atpB2'
  * t50_org_dim_0       (t50_org_dim_0) <U43 'Arabidopsis thaliana seedling lysate' ... 'Thermus thermophilus HB27 lysate'
  * sigma_dim_0         (sigma_dim_0) object 'A0A023PXQ4_YMR173W-A' ... 'Z4YNA9_AB124611'
Dimensions without coordinates: organism
Data variables:
    a_org_pop           (chain, draw) float32 519.3236 518.8292 ... 517.84784
    a_prot_shift        (chain, draw, a_prot_shift_dim_0) float32 ...
    b_org_pop           (chain, draw) float32 11.509291 11.445394 ... 11.929538
    b_prot_shift        (chain, draw, b_prot_shift_dim_0) float32 ...
    L_pop               (chain, draw) float32 3.445896 3.4300675 ... 3.3917112
    L                   (chain, draw, L_dim_0) float32 ...
    a_mu_org            (chain, draw, organism) float32 430.56827 ... 813.2518
    a                   (chain, draw, a_dim_0) float32 ...
    b_mu_org            (chain, draw, b_mu_org_dim_0) float32 9.997488 ... 8.389757
    b                   (chain, draw, b_dim_0) float32 ...
    t50_prot            (chain, draw, t50_prot_dim_0) float32 39.249863 ... 52.19809
    t50_org             (chain, draw, t50_org_dim_0) float32 43.067646 ... 96.93388
    sigma               (chain, draw, sigma_dim_0) float32 ...
Attributes:
    created_at:                 2020-04-23T08:54:58.300091
    arviz_version:              0.7.0
    inference_library:          pymc3
    inference_library_version:  3.8

尺寸:(L_尺寸0:34281,a_尺寸0:456260,a_保护尺寸0:34281,b_尺寸0:456260,b_组织尺寸0:15,b_保护尺寸0:34281,链:1,绘图:2000,生物体:15,西格玛尺寸0:34281,t50组织尺寸0:15,t50保护尺寸0:397)
协调:
*链(链)int64 0
*绘制(绘制)int64 01 2 3 4 5。。。1995 1996 1997 1998 1999
*a_prot_shift_dim_0(a_prot_shift_dim_0)对象'A0A023PXQ4_YMR173W-a'…'Z4YNA9_AB124611'
*b_prot_shift_dim_0(b_prot_shift_dim_0)对象“A0A023PXQ4_YMR173W-A”…”Z4YNA9_AB124611'
*L_dim_0(L_dim_0)对象“A0A023PXQ4_YMR173W-A”…”Z4YNA9_AB124611'
a_mu_org_dim_0(有机体)int64 0 1 2 3 4 5 6 7 8 9 10 11 12 14
*a_dim_0(a_dim_0)对象“ytzI”“mtlF”…'atpG2''atpB2'
*b_mu_org_dim_0(b_mu_org_dim_0)int64 0 1 2 3 4 5。。。9 10 11 12 13 14
*b_dim_0(b_dim_0)对象“ytzI”“mtlF”…'atpG2''atpB2'

*t50_prot_dim_0(t50_prot_dim_0)我不确定我的解决方案是否是很好的实践,它感觉有点太粗糙了。此外,术语是相当棘手的,我会尽量坚持,但可能会失败。诀窍是删除坐标,以便
a_dim_0
b_dim_0
成为唯一的标注(现在是没有坐标的标注)。之后,可以将它们重命名为相同的对象并分配给新的coord。以下是一个例子:

从以下名为
ds
的数据集开始:

<xarray.Dataset>
Dimensions:  (a_dim_0: 15, b_dim_0: 15, chain: 4, draw: 100)
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
  * a_dim_0  (a_dim_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
  * b_dim_0  (b_dim_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Data variables:
    a        (chain, draw, a_dim_0) float64 0.8152 1.189 ... 1.32 -0.2023
    b        (chain, draw, b_dim_0) float64 0.6447 -0.8059 ... -0.06435 -0.8666
输出:

<xarray.Dataset>
Dimensions:   (chain: 4, draw: 100, organism: 15)
Coordinates:
  * chain     (chain) int64 0 1 2 3
  * draw      (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
  * organism  (organism) <U3 'o0' 'o1' 'o2' 'o3' ... 'o11' 'o12' 'o13' 'o14'
Data variables:
    a         (chain, draw, organism) float64 0.8152 1.189 ... 1.32 -0.2023
    b         (chain, draw, organism) float64 0.6447 -0.8059 ... -0.8666

尺寸:(链:4,拉伸:100,生物体:15)
协调:
*链(链)int64 0 1 2 3
*绘制(绘制)int64 01 2 3 4 5 6 7 8 9。。。90 91 92 93 94 95 96 97 98 99

*有机体(有机体)没有问题,但如果使用ArviZ转换器,从_pymc3调用
时,可能更容易传递
coords
dims
。在创建之后,了解如何执行此类操作仍然很重要。
<xarray.Dataset>
Dimensions:   (chain: 4, draw: 100, organism: 15)
Coordinates:
  * chain     (chain) int64 0 1 2 3
  * draw      (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
  * organism  (organism) <U3 'o0' 'o1' 'o2' 'o3' ... 'o11' 'o12' 'o13' 'o14'
Data variables:
    a         (chain, draw, organism) float64 0.8152 1.189 ... 1.32 -0.2023
    b         (chain, draw, organism) float64 0.6447 -0.8059 ... -0.8666