C# 使用MongoDB执行长时间运行的后台作业的Hangfire会不断重新启动_C#_Asp.net_Mongodb_Background Process_Hangfire

C# 使用MongoDB执行长时间运行的后台作业的Hangfire会不断重新启动

c# asp.net mongodb

C# 使用MongoDB执行长时间运行的后台作业的Hangfire会不断重新启动,c#,asp.net,mongodb,background-process,hangfire,C#,Asp.net,Mongodb,Background Process,Hangfire,我对使用1.6.19版atm和MongoDB作为存储的Hangfire有一个问题，我们目前有一个方法，计划如下： BackgroundJob.Schedule(() => DoAsyncTask(parameters, JobCancellationToken.Null), TimeSpan.FromMinutes(X)) 该任务将运行一个多小时，并包含一个循环，以便在作业完成时进行验证。在循环内部，有一个对cancellationToken.ThrowIfCancellationReq

我对使用1.6.19版atm和MongoDB作为存储的Hangfire有一个问题，我们目前有一个方法，计划如下：

BackgroundJob.Schedule(() => DoAsyncTask(parameters, JobCancellationToken.Null), TimeSpan.FromMinutes(X))

该任务将运行一个多小时，并包含一个循环，以便在作业完成时进行验证。在循环内部，有一个对cancellationToken.ThrowIfCancellationRequested（）的调用，以验证是否已请求取消，但此调用在执行后大约30分钟内一直被触发，并在完成前终止作业

我一直在搜索有关此问题的信息，但大部分信息都与旧版本或InvisibilityTimeout的使用有关，根据已被弃用，因此我想知道是否有其他人遇到此问题以及任何可能的解决方案

多谢各位

编辑：经过进一步调查，我发现取消问题只是HangFire在运行30分钟后再次调用任务的副作用，因为我在方法中设置了验证，以避免在流程仍在运行时重新输入（以避免重复数据），该过程将被视为已完成，因此将被取消

因此，我面临的真正问题是，我无法确定为什么HangFire在执行大约30分钟后仍继续调用该进程，我按照所述步骤将IIS上的应用程序设置为始终运行，并防止池被回收，但该行为仍然存在

针对我的问题实施的解决方案是在作业正确完成之前，在作业上设置分布式锁。我对实现做了一些小的更改，以包括作业id并更新对这个版本的HangFire上使用的新对象的调用，因此我将把它留在这里：

public class SkipConcurrentExecutionAttribute : JobFilterAttribute, IServerFilter
{
    private static readonly Logger logger = LogManager.GetCurrentClassLogger();

    private readonly int _timeoutInSeconds;

    public SkipConcurrentExecutionAttribute(int timeoutInSeconds)
    {
        if (timeoutInSeconds < 0) throw new ArgumentException("Timeout argument value should be greater that zero.");

        _timeoutInSeconds = timeoutInSeconds;
    }


    public void OnPerforming(PerformingContext filterContext)
    {
        var resource = $"{filterContext.BackgroundJob.Job.Type.FullName}.{filterContext.BackgroundJob.Job.Method.Name}.{filterContext.BackgroundJob.Id}";

        var timeout = TimeSpan.FromSeconds(_timeoutInSeconds);

        try
        {
            var distributedLock = filterContext.Connection.AcquireDistributedLock(resource, timeout);
            filterContext.Items["DistributedLock"] = distributedLock;
        }
        catch (Exception)
        {
            filterContext.Canceled = true;
            logger.Warn("Cancelling run for {0} job, id: {1} ", resource, filterContext.BackgroundJob.Id);
        }
    }

    public void OnPerformed(PerformedContext filterContext)
    {
        if (!filterContext.Items.ContainsKey("DistributedLock"))
        {
            throw new InvalidOperationException("Can not release a distributed lock: it was not acquired.");
        }

        var distributedLock = (IDisposable)filterContext.Items["DistributedLock"];
        distributedLock.Dispose();
    }
}

我希望这有帮助，返回的原因仍然未知，因此请随时使用您可能找到的任何信息扩展此答案。

针对我的问题实施的解决方案是在作业上设置分布式锁，直到作业正确完成。我对实现做了一些小的更改，以包括作业id并更新对这个版本的HangFire上使用的新对象的调用，因此我将把它留在这里：

public class SkipConcurrentExecutionAttribute : JobFilterAttribute, IServerFilter
{
    private static readonly Logger logger = LogManager.GetCurrentClassLogger();

    private readonly int _timeoutInSeconds;

    public SkipConcurrentExecutionAttribute(int timeoutInSeconds)
    {
        if (timeoutInSeconds < 0) throw new ArgumentException("Timeout argument value should be greater that zero.");

        _timeoutInSeconds = timeoutInSeconds;
    }


    public void OnPerforming(PerformingContext filterContext)
    {
        var resource = $"{filterContext.BackgroundJob.Job.Type.FullName}.{filterContext.BackgroundJob.Job.Method.Name}.{filterContext.BackgroundJob.Id}";

        var timeout = TimeSpan.FromSeconds(_timeoutInSeconds);

        try
        {
            var distributedLock = filterContext.Connection.AcquireDistributedLock(resource, timeout);
            filterContext.Items["DistributedLock"] = distributedLock;
        }
        catch (Exception)
        {
            filterContext.Canceled = true;
            logger.Warn("Cancelling run for {0} job, id: {1} ", resource, filterContext.BackgroundJob.Id);
        }
    }

    public void OnPerformed(PerformedContext filterContext)
    {
        if (!filterContext.Items.ContainsKey("DistributedLock"))
        {
            throw new InvalidOperationException("Can not release a distributed lock: it was not acquired.");
        }

        var distributedLock = (IDisposable)filterContext.Items["DistributedLock"];
        distributedLock.Dispose();
    }
}

我希望这会有所帮助，返回的原因仍然未知，因此请随时用您可能找到的任何信息扩展此答案。

我也有同样的问题，我花了很多时间在Hangfire主题中找到解决方案。但我注意到，只有在控制台事件之后才会触发取消

因此，问题不在于Hangfire本身，而在于Hangfire.Console项目。你用这个分机吗？切换到另一种日志记录方法解决了我所有的问题

我也有同样的问题，我花了很多时间在Hangfire主题中找到解决方案。但我注意到，只有在控制台事件之后才会触发取消

因此，问题不在于Hangfire本身，而在于Hangfire.Console项目。你用这个分机吗？切换到其他日志记录方法解决了我的所有问题

与ServiceFabric集群中的Hangfire.Core 1.7.6和Hangfire.Mongo 0.5.6存在相同问题，我已使用将PerformContext添加到作业中

这允许获取当前作业的作业ID:

var jobId=performContext.BackgroundJob.ID
计划在30分钟后重新启动的作业具有相同的作业ID。因此，可以检查是否没有具有相同ID的成功作业：
var backgroundJob = performContext.BackgroundJob;
var monitoringApi = JobStorage.Current.GetMonitoringApi();
var succeededCount = (int)monitoringApi.SucceededListCount();
if (succeededCount > 0) 
{
    var queryCount = Math.Min(succeededCount, 1000);

    // read up to 1000 latest succeeded jobs:
    var succeededJobs = monitoringApi.SucceededJobs(succeededCount - queryCount, queryCount);

    // check if job with the same ID already finished:
    if (succeededJobs.Any(succeededKp => backgroundJob.Id == succeededKp.Key)) 
    {
        // The job was already started and succeeded, skip this execution
        return;
    }
}

注意：还必须对作业方法进行注释，使其不会同时启动。超时应具有合理的限制，例如6小时：[禁用ConcurrentExecution（6*60*60）]
。否则，第二个作业可能会在30分钟后开始，而不是在第一个作业完成后开始。
与ServiceFabric集群中的Hangfire.Core 1.7.6和Hangfire.Mongo 0.5.6存在相同问题。我已使用将PerformContext添加到作业中
这允许获取当前作业的作业ID:var jobId=performContext.BackgroundJob.ID
计划在30分钟后重新启动的作业具有相同的作业ID。因此，可以检查是否没有具有相同ID的成功作业：
var backgroundJob = performContext.BackgroundJob;
var monitoringApi = JobStorage.Current.GetMonitoringApi();
var succeededCount = (int)monitoringApi.SucceededListCount();
if (succeededCount > 0) 
{
    var queryCount = Math.Min(succeededCount, 1000);

    // read up to 1000 latest succeeded jobs:
    var succeededJobs = monitoringApi.SucceededJobs(succeededCount - queryCount, queryCount);

    // check if job with the same ID already finished:
    if (succeededJobs.Any(succeededKp => backgroundJob.Id == succeededKp.Key)) 
    {
        // The job was already started and succeeded, skip this execution
        return;
    }
}

注意：还必须对作业方法进行注释，使其不会同时启动。超时应具有合理的限制，例如6小时：[禁用ConcurrentExecution（6*60*60）]
。否则，第二个作业可能会在30分钟后开始，而不是在第一个作业完成后开始。
这不是答案！请考虑在<代码>吊火>控制台< /代码>中添加更多关于问题的细节。这正是我在浪费大量时间寻找解决方案之前所需要的答案。控制台是一个可选的扩展，可以很容易地更换。在Hangfire github上有几个公开问题，在StackOverflow上也有同样的问题。我在屏幕截图上看到了控制台。这个bug看起来就像旧版Hangfire中的bug。了解差异很重要。您好，@NickolayKlestov感谢您的输入不幸的是，在我的情况下，我没有使用hangfire。控制台扩展，我在问题开始后尝试了一段时间，看看是否可以对问题有更多的了解，但无论是否使用扩展，相同的行为仍然存在，所以我知道这不是问题的原因。为了解决您的问题，您选择了什么日志记录方法？这不是答案！请考虑在<代码>吊火>控制台< /代码>中添加更多关于问题的细节。这正是我在浪费大量时间寻找解决方案之前所需要的答案。控制台是一个可选的扩展，可以很容易地更换。在Hangfire github上有几个公开问题，在StackOverflow上也有同样的问题。我在屏幕截图上看到了控制台