Erlang 如何统计文件日志中的唯一用户？_Erlang

Erlang 如何统计文件日志中的唯一用户？

erlang

Erlang 如何统计文件日志中的唯一用户？,erlang,Erlang,给定一个txt日志文件，其格式如下： USER_A timestamp1 otherstuff USER_B timestamp2 otherstuff USER_C timestamp3 otherstuff USER_A timestamp4 otherstuff USER_A timestamp5 otherstuff USER_C timestamp6 otherstuff USER_B timestamp7 otherstuff 如何计算erlang中不同的唯一用户数？我正在考虑逐行

给定一个txt日志文件，其格式如下：

USER_A timestamp1 otherstuff
USER_B timestamp2 otherstuff
USER_C timestamp3 otherstuff
USER_A timestamp4 otherstuff
USER_A timestamp5 otherstuff
USER_C timestamp6 otherstuff
USER_B timestamp7 otherstuff

如何计算erlang中不同的唯一用户数？我正在考虑逐行读取文件并使用PropList模块。每个用户将是一个键，其值将是出现的次数。读取文件后，我调用：

length(proplists:get_keys(List)).

这是实现我的结果的正确方法吗？

可能更适合使用模块中的一个集合来存储用户名，然后使用。

我也会使用集合模块来实现这一点，因为它既快速又简单集合不包含重复项

应使用以下代码执行此操作：

{ok,Bin} = file:read_file("test"),
List = binary_to_list(Bin),
Usernames = [hd(string:tokens(X," ")) || X <- string:tokens(List,[$\n])],
sets:size(sets:from_list(Usernames)).

{ok，Bin}=file:read_file（“test”），
列表=二进制到二进制列表（Bin），
Usernames=[hd（string:tokens（X，”）| | X我也会使用sets模块，因为它既快速又不包含重复项
应使用以下代码执行此操作：
{ok,Bin} = file:read_file("test"),
List = binary_to_list(Bin),
Usernames = [hd(string:tokens(X," ")) || X <- string:tokens(List,[$\n])],
sets:size(sets:from_list(Usernames)).

{ok，Bin}=file:read_file（“test”），
列表=二进制到二进制列表（Bin），
用户名= [HD（String：TokEngs[x，]））x x＜P>log文件通常很大，所以考虑在递归函数中一次一行使用：
% Count the number of distinct users in the file named Filename                        
count_users(Filename) ->
    {ok, File} = file:open(Filename, [read, raw, read_ahead]),
    Usernames = usernames(File, sets:new()),
    file:close(File),
    sets:size(Usernames).

% Add all users in File, from the current file pointer position and forward,
% to Set.
% Side-effects: File is read and the file pointer is moved to the end.          
usernames(File, Set) ->
    case file:read_line(File) of
        {ok, Line} ->
            Username = hd(string:tokens(Line, " ")),
            usernames(File, sets:add_element(Username, Set));
        eof ->
            Set
    end.

您可以这样称呼它：count\u users（“logfile”）

<> >注意<代码>用户名/2 < /COD>必须有效地工作，否则它将消耗更多的内存。
 日志文件通常很大，所以考虑在递归函数中一次一行使用：
% Count the number of distinct users in the file named Filename                        
count_users(Filename) ->
    {ok, File} = file:open(Filename, [read, raw, read_ahead]),
    Usernames = usernames(File, sets:new()),
    file:close(File),
    sets:size(Usernames).

% Add all users in File, from the current file pointer position and forward,
% to Set.
% Side-effects: File is read and the file pointer is moved to the end.          
usernames(File, Set) ->
    case file:read_line(File) of
        {ok, Line} ->
            Username = hd(string:tokens(Line, " ")),
            usernames(File, sets:add_element(Username, Set));
        eof ->
            Set
    end.

您可以这样称呼它：count\u users（“logfile”）

请注意，usernames/2
必须是有效的。否则它只会消耗更多内存。
请注意，这会将整个文件保存在内存中。从内存角度来看，最好一次读取文件中的一行，然后将用户名添加到集合中，例如在递归函数中。请注意将在内存中保存整个文件。从内存的角度来看，最好一次读取文件中的一行并将用户名添加到集合中，例如在递归函数中。