Python DNA碱基的随机numpy阵列_Python_Numpy_Random

Python DNA碱基的随机numpy阵列

python numpy random

Python DNA碱基的随机numpy阵列,python,numpy,random,Python,Numpy,Random,我想知道如何使用DNA碱基得到一个随机整数数组。我已经有了基本的numpy函数，但是如果不将numpy数组转换为字符串列表并返回整数，我就无法实现这一点。所以我失败了 #A = 1 #T = 2 #G = 3 #C = 4 np.random.randint(1, 5, size=(5, 3)) array([[1, 2, 1], [2, 2, 3], [2, 4, 2], [4, 2, 1], [1, 3, 4]]) 理想的输出将是numpy数组中的整数 arra

我想知道如何使用DNA碱基得到一个随机整数数组。我已经有了基本的numpy函数，但是如果不将numpy数组转换为字符串列表并返回整数，我就无法实现这一点。所以我失败了

#A = 1
#T = 2
#G = 3
#C = 4

np.random.randint(1, 5, size=(5, 3))

array([[1, 2, 1],
   [2, 2, 3],
   [2, 4, 2],
   [4, 2, 1],
   [1, 3, 4]])

理想的输出将是numpy数组中的整数

array([[121],
   [223],
   [242],
   [421],
   [134]])

感谢您提出的任何想法

您认为最好使用问题中描述的方法。。。int（）-->str（）-->int（）

或者，对于numpy类型的答案：

>>> foo = lambda x: int(''.join([str(n) for n in x]))
>>> np.apply_along_axis(foo, 1, thing)
Out[7]: array([414, 311, 221, 232, 131])

我认为你最好使用问题中描述的方法。。。int（）-->str（）-->int（）

或者，对于numpy类型的答案：

>>> foo = lambda x: int(''.join([str(n) for n in x]))
>>> np.apply_along_axis(foo, 1, thing)
Out[7]: array([414, 311, 221, 232, 131])

我认为你最好使用问题中描述的方法。。。int（）-->str（）-->int（）

或者，对于numpy类型的答案：

>>> foo = lambda x: int(''.join([str(n) for n in x]))
>>> np.apply_along_axis(foo, 1, thing)
Out[7]: array([414, 311, 221, 232, 131])

我认为你最好使用问题中描述的方法。。。int（）-->str（）-->int（）

或者，对于numpy类型的答案：

>>> foo = lambda x: int(''.join([str(n) for n in x]))
>>> np.apply_along_axis(foo, 1, thing)
Out[7]: array([414, 311, 221, 232, 131])

为什么不从已有的3个独立整数中构造一个3位整数：

import numpy as np

r = np.random.randint(1, 5, size=(5, 3))

print (r[:, 0] * 100 + r[:, 1] * 10 + r[:, 2])[:, None]

输出：

[[444]
 [332]
 [213]
 [434]
 [341]]

根据所需的输出形状，您可能不需要通过

[：，None]

进行重塑。但是这个版本产生的正是示例输出格式

一行：

更紧凑的版本使用随机矩阵和十进制幂向量之间的点积：

print np.random.randint(1, 5, size=(5, 3)).dot([100, 10, 1])[:, None]

更灵活：

通常，您可以根据行数

和列数

生成数组：

print np.random.randint(1, n, size=(n, d)).dot(np.power(10, range(d)))[:, None]

为什么不从已有的3个独立整数中构造一个3位整数：

import numpy as np

r = np.random.randint(1, 5, size=(5, 3))

print (r[:, 0] * 100 + r[:, 1] * 10 + r[:, 2])[:, None]

输出：

[[444]
 [332]
 [213]
 [434]
 [341]]

根据所需的输出形状，您可能不需要通过

[：，None]

进行重塑。但是这个版本产生的正是示例输出格式

一行：

更紧凑的版本使用随机矩阵和十进制幂向量之间的点积：

print np.random.randint(1, 5, size=(5, 3)).dot([100, 10, 1])[:, None]

更灵活：

通常，您可以根据行数

和列数

生成数组：

print np.random.randint(1, n, size=(n, d)).dot(np.power(10, range(d)))[:, None]

为什么不从已有的3个独立整数中构造一个3位整数：

import numpy as np

r = np.random.randint(1, 5, size=(5, 3))

print (r[:, 0] * 100 + r[:, 1] * 10 + r[:, 2])[:, None]

输出：

[[444]
 [332]
 [213]
 [434]
 [341]]

根据所需的输出形状，您可能不需要通过

[：，None]

进行重塑。但是这个版本产生的正是示例输出格式

一行：

更紧凑的版本使用随机矩阵和十进制幂向量之间的点积：

print np.random.randint(1, 5, size=(5, 3)).dot([100, 10, 1])[:, None]

更灵活：

通常，您可以根据行数

和列数

生成数组：

print np.random.randint(1, n, size=(n, d)).dot(np.power(10, range(d)))[:, None]

为什么不从已有的3个独立整数中构造一个3位整数：

import numpy as np

r = np.random.randint(1, 5, size=(5, 3))

print (r[:, 0] * 100 + r[:, 1] * 10 + r[:, 2])[:, None]

输出：

[[444]
 [332]
 [213]
 [434]
 [341]]

根据所需的输出形状，您可能不需要通过

[：，None]

进行重塑。但是这个版本产生的正是示例输出格式

一行：

更紧凑的版本使用随机矩阵和十进制幂向量之间的点积：

print np.random.randint(1, 5, size=(5, 3)).dot([100, 10, 1])[:, None]

更灵活：

通常，您可以根据行数

和列数

生成数组：

print np.random.randint(1, n, size=(n, d)).dot(np.power(10, range(d)))[:, None]

这是numpy的另一个答案

策略：首先预计算基础（只有64个，所以没什么大不了的），然后使用

np.random.choice

from itertools import product

nums = "1234"
bases = map(int,map("".join, product(nums,nums,nums)))
np.random.choice(bases,10**8)

强制转换为整数发生在预计算步骤中，因此不会成为瓶颈。在macbook上快速生成一亿个碱基对

注: 如果你想计算大量的碱基对，这种方法比先生成随机数然后取点积的线性方法快5倍（10**8个随机碱基分别为3秒和17秒）。这种策略需要对数据进行两次传递，而不是我的，因为我只需要一次

一般来说，如果您想要

碱基对和

样本，那么这样做的技巧如下：

bases = map(int,map("".join, product(*[nums]*d))
np.random.choice(bases,N)

如果d大于8或9，那么基数将足够长，您可能最好使用点积的其他版本。但是如果

很小，那么这肯定会更快。

这是numpy的另一个答案

策略：首先预计算基础（只有64个，所以没什么大不了的），然后使用

np.random.choice

from itertools import product

nums = "1234"
bases = map(int,map("".join, product(nums,nums,nums)))
np.random.choice(bases,10**8)

强制转换为整数发生在预计算步骤中，因此不会成为瓶颈。在macbook上快速生成一亿个碱基对

一般来说，如果您想要

碱基对和

样本，那么这样做的技巧如下：

bases = map(int,map("".join, product(*[nums]*d))
np.random.choice(bases,N)

如果d大于8或9，那么基数将足够长，您可能最好使用点积的其他版本。但是如果

很小，那么这肯定会更快。

这是numpy的另一个答案

策略：首先预计算基础（只有64个，所以没什么大不了的），然后使用

np.random.choice

from itertools import product

nums = "1234"
bases = map(int,map("".join, product(nums,nums,nums)))
np.random.choice(bases,10**8)

强制转换为整数发生在预计算步骤中，因此不会成为瓶颈。在macbook上快速生成一亿个碱基对

通常，如果您想要

base p