Python 如何获得网络流量的数字+；如何将列附加到numpy数组？_Python_Html_Url_Web_Numpy

Python 如何获得网络流量的数字+；如何将列附加到numpy数组？

python html url web numpy

Python 如何获得网络流量的数字+；如何将列附加到numpy数组？,python,html,url,web,numpy,Python,Html,Url,Web,Numpy,我想知道如何将列附加到numpy数组？假设我在.tsv中读到如下内容： from sklearn import metrics,preprocessing,cross_validation from sklearn.feature_extraction.text import TfidfVectorizer import sklearn.linear_model as lm import pandas as p print "loading data.." tra

我想知道如何将列附加到numpy数组？假设我在.tsv中读到如下内容：

  from sklearn import metrics,preprocessing,cross_validation
  from sklearn.feature_extraction.text import TfidfVectorizer
  import sklearn.linear_model as lm
  import pandas as p    
  print "loading data.."
  traindata = np.array(p.read_table('train.tsv')) #here is where I am unsure what to do

traindata的第一列包含每个网页的URL

在这之后，我想要的逻辑是：

for each row in traindata
          #run function to look up traffic webpage is getting, store this in a numpy array
Add a new column to traindata numpy array, append on the data in the array created into our "for each"

即使只使用“填充”方法检索web流量，通常如何实现这一点

谢谢

Inputs and outputs : 
    Input : Numpy array of 26 columns.
    We call a function on the value in the first column of each row, this function will return a number.
    We append all these numbers into a numpy array with one column.
    We append the Numpy array with 26 cols to the one made above to end up with a numpy array with 27 columns.
Output : Numpy array of 26 columns.

您可以使用来追加列，如下所示：

import numpy as np

def some_function(x):
    return 3*x

input = np.ones([10,26])
input = np.hstack([input,np.empty([input.shape[0],1])])
for row in input:
    row[-1] = some_function(row[0])

output = input

我不喜欢numpy.hstack或numpy.c_的一点是，它们不够灵活，无法处理二维或一维数组

例如，如果我试图根据向量的大小计算一个值，并将其附加到该向量上（就像在Delaunay三角剖分问题中将一个点提升到抛物面上），我希望该函数适用于单个一维数组或一维数组的数组。我最终得到的功能是：

def append\u last\u dim（数组中，数组增大）：
newshape=list（数组_in.T.shape）
新闻形状[0]+=1
ret_数组=np.empty（新形状）
ret_数组[：-1]=数组_in.T
ret_数组[-1]=数组扩充
返回ret_数组.T

例如：

point\u list=np.random.rand（5,4）
列表增加=点列表**2.和（轴=-1）#形状（5，）
%timeit aug\u数组=追加最后一个维度（点列表、数组扩充）
#每个回路1.68µs±19.9 ns（7次运行的平均值±标准偏差，每个1000000个回路）
点=点列表[0]#形状（4，）
augment=list_augment[0]#shape（）
%timeit append\u last\u dim（点，增大）
#每个回路1.24µs±9.78 ns（7次运行的平均值±标准偏差，每个1000000个回路）
def提升点（点）：#这适用于1点或点阵列
返回追加最后一个维度（点，（点**2）.sum（轴1））
提升点（点列表）。形状（5,5）
提升点（点）。形状（5）

numpy.c_u按原样使用点数组，但速度慢10倍，不适用于单个输入数组：

%timeit retval = np.c_[point_list,array_augment]
# 13.8 µs ± 47.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

np.c_[point,augment]
# ValueError: all the input array dimensions for the concatenation axis must match exactly, 
# but along dimension 0, the array at index 0 has size 4 and the array at 
# index 1 has size 1

np.hstack和np.append不能按原样处理参数，因为

point\u list

和

point\u augment

具有不同的维度，但如果您重塑

point\u augment

，则结果仍然慢约2倍，并且无法使用统一调用处理单个输入或输入数组：

%timeit np.hstack（（点列表，点增强重塑（5,1）））
#每个回路3.49µs±21.2 ns（7次运行的平均值±标准偏差，每个100000个回路）
%timeit np.append（点列表，点扩充，重塑（（5,1）），轴=1）
#每个回路2.45µs±7.91 ns（7次运行的平均值±标准偏差，每个100000个回路）

以下是列出1000点的时间：

point_1k_list = np.random.rand(5,4)
point_augment = (point_1k_list**2).sum(axis=-1)

%timeit append_last_dim(point_1k_list,point_augment)
# 3.91 µs ± 35 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.append(point_1k_list,point_augment.reshape((1000,1)),axis=1)
# 6.5 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.hstack((point_1k_list,point_augment.reshape((1000,1))))
# 7.82 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.c_[point_1k_list,point_augment]
# 19.3 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

我不知道为什么我在numpy中找不到更好的内置支持来处理单点或矢量化数据，如上面的“提升点”功能。

要求我们推荐或查找工具、库或最喜欢的非现场资源的问题对于堆栈溢出来说是离题的，因为它们往往会吸引自以为是的答案和垃圾邮件。相反，请描述问题以及迄今为止为解决问题所做的工作。这里有两个截然不同的问题。请一次问一个问题。@JanDvorak谢谢你提供的信息！我现在已经解决了这个问题-工具是不重要的部分，我只需要一种方法来获取数据：）@msvalkon你说得对，我现在将把它调整为两个问题。请给我一点时间。抱歉，第一部分，谢谢你。我可以在for each中运行什么来将值附加到我在这里创建的numpy数组的列中？谢谢：）你能举例说明你的输入和输出吗？谢谢