在Python中使同一df中的字典项与列分离_Python_Pandas_Dictionary_Add

在Python中使同一df中的字典项与列分离

python pandas dictionary

在Python中使同一df中的字典项与列分离,python,pandas,dictionary,add,Python,Pandas,Dictionary,Add,我有一个巨大的推特数据帧（9530232x19）。第一列包括一本词典。我想在同一个df中从该字典项中创建单独的列。另外，我在“实体”列中有一个字典，我想把它分开。我想将度量添加为四个新列，例如“rtcount”、“reply_count”、“like_count”和“quote_count”，以及entities['htype']，作为现有df右侧的一个新列，而不创建任何更多的数据帧，因为这个大df几乎使用了我所有的16 GB RAM，并且偶尔会崩溃。我知道在这个大df中使用for循环不是一种

我有一个巨大的推特数据帧（9530232x19）。第一列包括一本词典。我想在同一个df中从该字典项中创建单独的列。另外，我在“实体”列中有一个字典，我想把它分开。我想将度量添加为四个新列，例如“rtcount”、“reply_count”、“like_count”和“quote_count”，以及entities['htype']，作为现有df右侧的一个新列，而不创建任何更多的数据帧，因为这个大df几乎使用了我所有的16 GB RAM，并且偶尔会崩溃。我知道在这个大df中使用for循环不是一种有效的方法，但我无法想出其他方法。
非常感谢您的帮助

htypedf=[]
t=[]
for i in range(0,len(d)):
    if i%100==0:
        print(i)
    htype=[]
    hasht=[]
    t=d[i:i+1]
    metrics=pd.Series(t['public_metrics'][0]).to_frame().T
    try:
        htype=list(map(lambda x : x['type'], t['entities'][0]['annotations']))
    except:
        htype=('NaN')

    d.iloc[i] = pd.concat([t, metrics, pd.DataFrame({'htype': [htype]})],axis=1)

请尝试以下操作，而不是for循环：

d = pd.concat([t.drop['public_metrics'], t['public_metrics'].apply(pd.Series)], axis=1)

类似的概念也可以用于获取htype，但处理方式将取决于您希望如何保留数据。如果您只需要entities列中的htype，则可以尝试以下操作：

d = pd.concat([t.drop['entities'], t['entities'].apply(pd.Series)['htype']], axis=1)

要在新的htype列之外保留entities列，您应该能够使用以下代码（只需删除drop函数）：

让我知道这对你是如何起作用的

新代码块：

def fetch_htype(row):
    entities_dict = row['entities']
    if np.isnan(entities_dict):
        return pd.Series(data = '', index = ['htype'])
    else:
        return pd.Series(data = entities_dict['htype'], index = ['htype'])

d = pd.concat([d, t.apply(fetch_htype)], axis=1)

非常感谢，第一个给出了一个错误，但我删除了t.drop['entities']部分，并为公共度量工作。现在问题出在第二个。一些“实体”是NaN，它说“TypeError:‘float’对象不可下标”，正因为如此。我怎么能忽略这行代码中的NAN呢？请按如下所示再次尝试第一部分。如果有效，我将编辑我的回复

d=pd.concat（[t，t['public_metrics'].apply（pd.Series）]，axis=1）

！而且非常非常快。谢谢第二个有机会吗？哦，对不起；我误解了！对于具有NAN和抛出错误的实体，有两个选项：1）更改传递的函数以应用到NAN帐户，或2）在应用之前处理NAN。2可能更容易，所以我正在编辑我的帖子来说明这一点。尝试底部代码块中的新代码！它不工作，没有对数据进行任何修改。南斯还在那里。我以前试过一次，结果是一样的。无法更改该列。

def fetch_htype(row):
    entities_dict = row['entities']
    if np.isnan(entities_dict):
        return pd.Series(data = '', index = ['htype'])
    else:
        return pd.Series(data = entities_dict['htype'], index = ['htype'])

d = pd.concat([d, t.apply(fetch_htype)], axis=1)