Python 3.x 使用networkx的Twitter hashtags网络_Python 3.x_Twitter_Networkx

Python 3.x 使用networkx的Twitter hashtags网络

python-3.x twitter

Python 3.x 使用networkx的Twitter hashtags网络,python-3.x,twitter,networkx,Python 3.x,Twitter,Networkx,首先，我想道歉，因为我是Twitter数据分析的新手我想建立一个用户标签网络，在这里我根据用户的tweet标签连接用户。我已经在MongoDB中存储了tweets，但我无法从extended entities对象中提取所有的hashtag，老实说，我有点迷茫于如何做到这一点，你能建议这是实现它的最好方法吗我曾尝试将哈希标签存储在数据栏中的一个新列中，但我只能检索其中一个，这不起作用，因为我需要考虑推特中的所有标签来建立连接。p> 我有以下代码来检索第二个数据帧中的hashtag def ge

首先，我想道歉，因为我是Twitter数据分析的新手

我想建立一个用户标签网络，在这里我根据用户的tweet标签连接用户。我已经在MongoDB中存储了tweets，但我无法从extended entities对象中提取所有的hashtag，老实说，我有点迷茫于如何做到这一点，你能建议这是实现它的最好方法吗

<>我曾尝试将哈希标签存储在数据栏中的一个新列中，但我只能检索其中一个，这不起作用，因为我需要考虑推特中的所有标签来建立连接。p> 我有以下代码来检索第二个数据帧中的hashtag

def get_tweet_data(df2):
    df2["user_id"] = df1["user"].apply(lambda x: x["id"])
    df2["screen_name"] = df1["user"].apply(lambda x: x["screen_name"])
    df2["hashtags"] = df1["entities"].apply(lambda x: x["hashtags"][0]["text"] if x["hashtags"] else np.nan)
    return df2

因此，我认为：

我在寻找这样的东西：

但我还有另一个问题，我需要根据每个推特用户的标签连接他们，这样用户就可以与#Puertos用户、Pemex用户和#abierto用户建立连接。我不知道该怎么做

要使用以下代码生成图形im：

G = nx.from_pandas_edgelist(
df2,
source = "screen_name",
target = "hashtags",
create_using = nx.Graph())

再次表示歉意，我只是从这里开始。

让我们一步一步来。首先，您希望从每条tweet中提取hashtag。我喜欢这个任务的第二个答案。在您的上下文中，这意味着运行类似于：

df['hashtags']=df['text'].map（lambda s:[i代表s.split（）中的i，如果i.startswith（“#”）的话）

这将添加一列，其中每个条目都是hashtags列表

第二步更为复杂。我将首先创建一个由用户和hashtags组成的二分网络。Edges将用户与他们使用的hashtag连接起来。然后，您可以使用NetworkX的二分投影函数创建一个用户网络，其边缘指示共享标签的使用。下面是这可能的工作原理示意图：

user_to_hashtags_dict=dict(df[['user_id','hashtags']].values) #a more convenient data structure: a dictionary with users as keys and the list of hashtags they use as values.
    B=nx.Graph() #create an empty graph
    for user in user_to_hashtags: #loop over all the users
        for hashtag in user_to_hashtags[user]: #for each user loop over the hashtags they use
            B.add_edge(user,hashtag) #add the edge User<->hashtag
actual_users_with_hashtags = [x for x in list(set(df.user_id)) if x in B.nodes()] #create a list of users actually appearing in the network - perhaps some tweeting users never used a hashtag and we want to ignore them.
G = nx.bipartite.weighted_projected_graph(B,nodes =actual_users_with_hashtags) #project the bipartite network onto the the users.

user_to_hashtags_dict=dict（df[['user_id'，'hashtags']].values）#一种更方便的数据结构：一个以用户为键的字典以及他们用作值的hashtags列表。
B=nx.Graph（）#创建一个空图
对于用户标签中的用户：#在所有用户上循环
对于user_to_hashtags[user]中的hashtag:#对于每个用户，在他们使用的hashtags上循环
添加边缘（用户，标签）#添加边缘用户标签
实际_users_with_hashtags=[x for x in list（set（df.user_id）），如果x in B.nodes（）]#创建一个实际出现在网络中的用户列表-可能一些推特用户从未使用过hashtag，我们希望忽略它们。
G=nx.bipartite.weighted_projected_graph（B，nodes=reactive_users_，带有_hashtags）#将二分网络投影到用户上。

G应该是您感兴趣的网络，包括用户之间边缘上的权重，计算他们共同使用的哈希标记的数量。

让我们一步一步来做。首先，您希望从每条tweet中提取hashtag。我喜欢这个任务的第二个答案。在您的上下文中，这意味着运行类似于：

df['hashtags']=df['text'].map（lambda s:[i代表s.split（）中的i，如果i.startswith（“#”）的话）

这将添加一列，其中每个条目都是hashtags列表

user_to_hashtags_dict=dict(df[['user_id','hashtags']].values) #a more convenient data structure: a dictionary with users as keys and the list of hashtags they use as values.
    B=nx.Graph() #create an empty graph
    for user in user_to_hashtags: #loop over all the users
        for hashtag in user_to_hashtags[user]: #for each user loop over the hashtags they use
            B.add_edge(user,hashtag) #add the edge User<->hashtag
actual_users_with_hashtags = [x for x in list(set(df.user_id)) if x in B.nodes()] #create a list of users actually appearing in the network - perhaps some tweeting users never used a hashtag and we want to ignore them.
G = nx.bipartite.weighted_projected_graph(B,nodes =actual_users_with_hashtags) #project the bipartite network onto the the users.

user_to_hashtags_dict=dict（df[['user_id'，'hashtags']].values）#一种更方便的数据结构：一个以用户为键的字典以及他们用作值的hashtags列表。
B=nx.Graph（）#创建一个空图
对于用户标签中的用户：#在所有用户上循环
对于user_to_hashtags[user]中的hashtag:#对于每个用户，在他们使用的hashtags上循环
添加边缘（用户，标签）#添加边缘用户标签
实际_users_with_hashtags=[x for x in list（set（df.user_id）），如果x in B.nodes（）]#创建一个实际出现在网络中的用户列表-可能一些推特用户从未使用过hashtag，我们希望忽略它们。
G=nx.bipartite.weighted_projected_graph（B，nodes=reactive_users_，带有_hashtags）#将二分网络投影到用户上。

G应该是您感兴趣的网络，包括用户之间边上的权重，计算他们共同使用的哈希标记的数量。

您好，欢迎使用Stack Overflow！您能添加一些示例数据和您在帖子中编写的代码吗？您好，我刚刚添加了更多信息，谢谢您的回复。您好，欢迎来到Stack Overflow！你能在你的帖子中添加一些示例数据和代码吗？你好，我刚刚添加了更多信息，谢谢你的回复。非常感谢！现在我可以像我想的那样获取列中的所有hashtag了！：）现在我在你提供的第二个代码中遇到了一个错误，我知道你已经帮了我很多，所以我很难再问了。G=nx.bipartite.weighted_projected_graph（B，nodes=user_to_hashtags_dict.keys（））KeyError:415173881Hi-我添加了一行并更改了投影代码。我认为问题在于，你的数据中有一些twitter用户从未使用过标签。它们没有添加到二分用户标签网络中，因此不能考虑用于投影。希望这有帮助！你好，Johannes，又是我，我在添加属性时遇到了一些问题，我想知道你是否能启发我=/非常感谢！现在我可以像我想的那样获取列中的所有hashtag了！：）现在我在你提供的第二个代码中遇到了一个错误，我知道你已经帮了我很多，所以我很难再问了。G=nx.bipartite.weighted_projected_graph（B，nodes=user_to_hashtags_dict.keys（））KeyError:415173881Hi-我添加了一行并更改了投影代码。我认为问题在于，你的数据中有一些推特用户从来没有