Python 将（特征、值）元组列表转换为numpy数组_Python_List_Numpy

Python 将（特征、值）元组列表转换为numpy数组

python list numpy

Python 将（特征、值）元组列表转换为numpy数组,python,list,numpy,Python,List,Numpy,想象一下，我在句子中有单词计数的数据，其中每个句子都是一个实例例如，这是句子“我爱苹果爱”和“哦，我的上帝苹果”的数据： data=[（“我”，1），（“爱”，2），（“苹果”，1）]，[（“哦”，1），（“我的”，1），（“上帝”，1），（“苹果”，3）] 我想将其转换为二维np数组，其中特征为单词，特征值为单词频率，在本例中：句子id 我爱苹果哦我的上帝 0 1. 2. 1. 0 0 0 1. 0 0 3. 1. 1. 1. 在你的第一个例子中，你没有零，但是在你的表中，你有，

想象一下，我在句子中有单词计数的数据，其中每个句子都是一个实例

例如，这是句子“我爱苹果爱”和“哦，我的上帝苹果”的数据：

data=[（“我”，1），（“爱”，2），（“苹果”，1）]，[（“哦”，1），（“我的”，1），（“上帝”，1），（“苹果”，3）]

我想将其转换为二维np数组，其中特征为单词，特征值为单词频率，在本例中：

句子id 我爱苹果哦我的上帝 0 1. 2. 1. 0 0 0 1. 0 0 3. 1. 1. 1.

在你的第一个例子中，你没有零，但是在你的表中，你有，为什么？是的，这就是我想要做的。但是，我没有得到原始句子，只有

数据

数组是已知的。在这种情况下我该怎么办？谢谢@YigeSong，请查看更新的答案

>>> import pandas as pd

>>> data = [[("I", 1), ("love", 2), ("apple", 1)],[("Oh", 1), ("my", 1), ("god", 1), ("apple", 3)]]

>>> data
[[('I', 1), ('love', 2), ('apple', 1)], [('Oh', 1), ('my', 1), ('god', 1), ('apple', 3)]]

>>> dfs = []
>>> for item in data:
      val = dict(item)
      index = [' '.join(dict(item).keys())]
      df = pd.DataFrame(val, index=index)
      dfs.append(df)
    
>>> sent_df = pd.concat(dfs)

>>> sent_df
                   I  love  apple   Oh   my  god
I love apple     1.0   2.0      1  NaN  NaN  NaN
Oh my god apple  NaN   NaN      3  1.0  1.0  1.0

>>> sent_df.index.name = 'sentence'

>>> sent_df = sent_df.reset_index().fillna(0)
>>> sent_df
          sentence    I  love  apple   Oh   my  god
0     I love apple  1.0   2.0      1  0.0  0.0  0.0
1  Oh my god apple  0.0   0.0      3  1.0  1.0  1.0

# if you don't want sentence inside the dataframe
# ===============================================

>>> sent_df = sent_df.drop('sentence', axis=1)

>>> sent_df
     I  love  apple   Oh   my  god
0  1.0   2.0      1  0.0  0.0  0.0
1  0.0   0.0      3  1.0  1.0  1.0

>>> sent_df.index.name = 'sentence_id'

>>> sent_df.reset_index()
   sentence_id    I  love  apple   Oh   my  god
0            0  1.0   2.0      1  0.0  0.0  0.0
1            1  0.0   0.0      3  1.0  1.0  1.0

# if you want 2-D numpy array (numpy array doesn't preserve column names)
# =======================================================================

>>> sent_df.reset_index().to_numpy()
array([[0., 1., 2., 1., 0., 0., 0.],
       [1., 0., 0., 3., 1., 1., 1.]])