Erlang 保持进程活动的远程节点_Erlang_Distributed_Watchdog

Erlang 保持进程活动的远程节点

erlang

Erlang 保持进程活动的远程节点,erlang,distributed,watchdog,Erlang,Distributed,Watchdog,国际公共部门监管——国际公共部门监管（子宫分母）我有以下设置：在一个节点上（'one@erlang.enzo“）正在运行一个服务器进程，其中一个监视程序正在运行另一个节点（”）two@erlang.enzo'). 当服务器启动时，它将启动远程节点上的看门狗。当服务器不正常退出时，看门狗会再次启动服务器。当看门狗退出时，服务器将再次启动它网络启动后，服务器作为运行级别的一部分启动服务器还监视远程节点，并在其（即节点）联机时启动看门狗。现在，服务器和看门狗之间的连接丢失可能有两个原因：第一，

国际公共部门监管——国际公共部门监管（子宫分母）

我有以下设置：

在一个节点上（'one@erlang.enzo“）正在运行一个服务器进程，其中一个监视程序正在运行另一个节点（”）two@erlang.enzo'). 当服务器启动时，它将启动远程节点上的看门狗。当服务器不正常退出时，看门狗会再次启动服务器。当看门狗退出时，服务器将再次启动它

网络启动后，服务器作为运行级别的一部分启动

服务器还监视远程节点，并在其（即节点）联机时启动看门狗。现在，服务器和看门狗之间的连接丢失可能有两个原因：第一，网络可能崩溃；其次，节点可能会崩溃或被杀死

我的代码似乎运行正常，但我有点怀疑发生了以下情况：

当看门狗节点关闭（或终止或崩溃）并重新启动时，服务器将正确地重新启动其看门狗

但是当网络发生故障且看门狗节点继续运行时，服务器会在重新建立连接时启动一个新的看门狗，并留下一个僵尸看门狗

我的问题是：

（A）我会创造僵尸吗

（B）在网络丢失的情况下，服务器如何检查看门狗是否仍然活着（反之亦然）

（C）如果B是可能的，我如何重新连接旧服务器和旧看门狗

（D）尊敬的读者，您在我的设置中发现了哪些其他主要（和次要）缺陷

编辑：死亡和杀死狗消息用于假装不安全的退出，并且不会超出调试范围
代码如下：

使用模块注册看门狗应该可以避免您的问题：
watchdog.erl：

-module (watchdog). -compile (export_all). init () -> io:format ("Watchdog: Starting @ ~p.~n", [node () ] ), process_flag (trap_exit, true), global:register_name (watchdog, self ()), loop (). loop () -> receive die -> 1 / 0; {'EXIT', _, normal} -> io:format ("Watchdog: Server shut down.~n"); {'EXIT', _, _} -> io:format ("Watchdog: Restarting server.~n"), spawn ('one@erlang.enzo', server, start, [] ); _ -> loop () end.
server.erl：

checkNode () -> net_adm:world (), case lists:any (fun (Node) -> Node =:= 'two@erlang.enzo' end, nodes () ) of false -> io:format ("Server: Watchdog node is still down.~n"), {down, none}; true -> io:format ("Server: Watchdog node has come online.~n"), global:sync (), %% not sure if this is necessary case global:whereis_name (watchdog) of undefined -> io:format ("Watchdog process is dead"), Watchdog = spawn_link ('two@erlang.enzo', watchdog, init, [] ); Watchdog -> io:format ("Watchdog process is still alive") end, {up, Watchdog} end.

非常感谢你。看门狗还活着时，我不需要调用
link/1
？或者进程在相互收到
{'EXIT'，Pid，noconnection'}
后仍然链接？我不确定，实际上（到目前为止还没有使用分布式Erlang）。
-module (watchdog). -compile (export_all). init () -> io:format ("Watchdog: Starting @ ~p.~n", [node () ] ), process_flag (trap_exit, true), global:register_name (watchdog, self ()), loop (). loop () -> receive die -> 1 / 0; {'EXIT', _, normal} -> io:format ("Watchdog: Server shut down.~n"); {'EXIT', _, _} -> io:format ("Watchdog: Restarting server.~n"), spawn ('one@erlang.enzo', server, start, [] ); _ -> loop () end.

checkNode () -> net_adm:world (), case lists:any (fun (Node) -> Node =:= 'two@erlang.enzo' end, nodes () ) of false -> io:format ("Server: Watchdog node is still down.~n"), {down, none}; true -> io:format ("Server: Watchdog node has come online.~n"), global:sync (), %% not sure if this is necessary case global:whereis_name (watchdog) of undefined -> io:format ("Watchdog process is dead"), Watchdog = spawn_link ('two@erlang.enzo', watchdog, init, [] ); Watchdog -> io:format ("Watchdog process is still alive") end, {up, Watchdog} end.