C# 数据处理策略kafka或数据库或任何其他持久性

C# 数据处理策略kafka或数据库或任何其他持久性,c#,.net,apache-kafka,architecture,software-design,C#,.net,Apache Kafka,Architecture,Software Design,我正在研究大量(异步)数据处理策略,我在这里过于简化了问题- 我得到了一个记录集——比如说- A-event1 B-event1 B-event2 C-event1 C-event2 C-event3 B-event3 A-event2 A-event3 D-event1 D-event2 C-event4 A-event4 A-event4 A-event6 A-eventfinal B-eventfinal C-event6 C-event7 C-eventFinal D-eventF

我正在研究大量(异步)数据处理策略,我在这里过于简化了问题-

我得到了一个记录集——比如说-

A-event1
B-event1 
B-event2  
C-event1
C-event2
C-event3
B-event3
A-event2
A-event3
D-event1
D-event2
C-event4
A-event4
A-event4
A-event6
A-eventfinal
B-eventfinal
C-event6
C-event7
C-eventFinal
D-eventFinal
此记录集的转换将是

A-event1      B-event1         C-event1        D-event1
A-event2      B-event2         C-event2        D-event2
A-event3      B-event3         C-event3        D-eventFinal
A-event4      B-eventfinal     C-event4
A-eventFinal                   C-event5
                               C-event6
                               C-event7  
                               C-eventFinal
                   
一旦我获得最终事件数据,则只有此集合可供进一步处理。一旦实体点击最终,它就有资格进行进一步处理。此单个集合现在被发送到第三方应用程序,它将得到处理,成功完成后,它将返回关闭事件或确认,或者可能失败,因此,该单独的集合已准备好清除或保留以进行进一步纠正(如果失败),在此警告,确认或关闭可能需要几天才能收到。所以我必须将这些数据保存在某个地方(可能是数据库、卡夫卡或类似的东西)

在这里,我使用A、bc和D作为实体标识符,它可以是上万个(比如guid)。我还需要一种重新处理整个记录集的能力

我所阐述的几个选择是:

  • 每个标识符都有一个动态的卡夫卡主题,但在任何一点上,它可能会维护1000个主题中的1万个,我会尽量避免使用DB
  • 将整个集合放在一个Kafka主题上,并创建另一个重试主题,类似于应用程序X保持轮询重试主题
  • 我对这里的任何数据处理算法都持开放态度,不提数据丢失是不可接受的

    我理解这个解释有点抽象,请让我知道,如果你需要进一步的解释,任何帮助或建议将不胜感激


    我正在寻找一种架构方法。

    您的描述在细节上有点透彻。但是,您可以通过数据库和某种管道(选择您的毒药)轻松解决此问题

    在我使用Dataflow的这个极其做作的示例中,您可以使用您喜欢的任何结构或框架,但是问题仍然是一样的。在这个示例中,数据流可以毫不费力地完成一些事情

    • 可以使用异步和等待模式
    • 以有序的方式处理事情(或不进行)
    • 可以使用队列进行处理,可以并行处理事情
    • 配置最大并行度
    • 可以创建永久管道
    • 可以取消代币和更多吗
    我不得不做出很多假设,并留下了很多想象

      <> LI>需要考虑容错性
    • 实施取消制度
    • 调整平行度和其他选项
    • 为事件实现一个数据库
    • 如果您的进程出现故障,则有一个翻转和重新启动机制
    示例

    public enum EventType
    {
       Event,
       Final,
       Finished,
       Error
    }
    
    public class EventMessage
    {
       public int GroupId { get; set; }
       public int EventId { get; set; }
       public string Payload { get; set; }
       public EventType EventType { get; set; }
    }
    
    public static ConcurrentDictionary<int,List<EventMessage>> _dataStore = new ConcurrentDictionary<int,List<EventMessage>>();
    private static BufferBlock<EventMessage> _start;
    private static ActionBlock<EventMessage> _persistBlock;
    private static ActionBlock<EventMessage> _processBlock;
    private static ActionBlock<EventMessage> _finalizeBlock;
    private static TransformBlock<EventMessage, EventMessage> _reprocessBlock;
    private static TransformBlock<EventMessage, EventMessage> _queue;
    private static Random _r = new Random();
    
    
    static async Task Main(string[] args)
    {
    
       // this is just a buffer that can receive asynchronous events
       _start = new BufferBlock<EventMessage> (new DataflowBlockOptions(){EnsureOrdered = true});
    
       // we need an orderly queue, the bounded capacity is 1 so we can process events in order 
       // ie so you don't process the final before all events are recevied
       _queue = new TransformBlock<EventMessage, EventMessage>(message => message, new ExecutionDataflowBlockOptions(){BoundedCapacity = 1});
    
       // save your events to the database
       _persistBlock = new ActionBlock<EventMessage>(PersistAction, new ExecutionDataflowBlockOptions() { BoundedCapacity = 1 });
    
       // process the final event
       _processBlock = new ActionBlock<EventMessage>(ProcessAction);
    
       // process the event from the 3rd party service
       _finalizeBlock = new ActionBlock<EventMessage>(FinalizeAction);
    
       // reprocess on failure or whatever you need to do
       _reprocessBlock = new TransformBlock<EventMessage, EventMessage>(Reprocess);
    
       // link it all together
       _start.LinkTo(_queue);
       _queue.LinkTo(_persistBlock, (x) => x.EventType == EventType.Event);
       _queue.LinkTo(_processBlock, (x) => x.EventType == EventType.Final);
       _queue.LinkTo(_finalizeBlock, (x) => x.EventType == EventType.Finished);
       _queue.LinkTo(_reprocessBlock, (x) => x.EventType == EventType.Error);
       _reprocessBlock.LinkTo(_start);
    
       // create some events
       var tasks= Enumerable.Range(1, 5).Select(CreateEvents);
    
       await Task.WhenAll(tasks);
    
       Console.ReadKey();
    }
    
    private static async Task CreateEvents(int groupId)
    {
       var events = Enumerable
          .Range(1, _r.Next(2, 5))
          .Select(x => new EventMessage()
          {
             GroupId = groupId,
             EventId = x,
             EventType = EventType.Event
          });
       foreach (var e in events)
       {
          await Task.Delay(_r.Next(10, 100));
          await _start.SendAsync(e);
       }
    
       await _start.SendAsync(new EventMessage()
       {
          GroupId = groupId,
          Payload = $"Final Event",
          EventType = EventType.Final
       });
    }
    private static EventMessage Reprocess(EventMessage e)
    {
       // the event come back as an error, so we push it back on the the queue
       Console.WriteLine($"Reprocessing group : {e.GroupId}");
       e.EventType = EventType.Final;
       e.Payload = e.Payload + " Error";
       return e;
    }
    
    private static async Task PersistAction(EventMessage e)
    {
       // this is simulating saving the event to a db
       Console.WriteLine($"Saving event : {e.GroupId}:{e.EventId}");
       await Task.Delay(_r.Next(10, 100));
       _dataStore.AddOrUpdate(e.GroupId,
          (x) => new List<EventMessage>() {e},
          (x, l) =>
          {
             l.Add(e);
             return l;
          });
    }
    private static async Task ProcessAction(EventMessage e)
    {
       // this is simulating reading all the events for that group from the db
       // and sending to your 3rd service
       Console.WriteLine($"Sending to service : {e.GroupId}");
    
       await Task.Delay(_r.Next(10, 100));
    
       // this is simulating receiving a result from the 3rd party service 
       // just pushes the event back in to the queue, to be finialised or reprocessed
       // choose randomly if it was a success or failure
       // obviously this would be called by something else, possibly your message queue
       if (_r.Next(0, 2) == 0)
          e.EventType = EventType.Finished;
       else
          e.EventType = EventType.Error;
    
    
       Console.WriteLine($"Service returned : {e.GroupId}, {e.EventType}");
    
       await _start.SendAsync(e);
    }
    private static void FinalizeAction(EventMessage e)
    {
     // pruge the records, we are all done
       _dataStore.TryRemove(e.GroupId, out var l);
    
       Console.WriteLine($"*** Finalize : {e.GroupId} - {string.Join(",", l.Select(x => x.EventId))}");
    }
    

    注意:这只是一个示例,并不意味着它是一个完整的解决方案或数据流建议,甚至不意味着您应该如何解决它。这只是给你一个结构化管道的概念。

    你需要回答的第一个问题是在收到最终事件后进行后处理的时间。这将告诉您是否可以将数据保存到文件中,还是必须在内存中保存相同的数据。您必须确定执行算法所需的速度与内存以及机器内存量与内存量。然后确定算法。是否可以安全地假设,所有事件的结构都是相同的,它们都有一个id,闭包返回一个id?是和是。没错。Id对于每个集合都是唯一的,这就是它如何记录为应用程序的审核跟踪。@jdweng后期处理不是时间敏感的,而是基于准确性的。那么内存呢?
    Saving event : 4:1
    Saving event : 1:1
    Saving event : 4:2
    Saving event : 1:2
    Saving event : 5:1
    Saving event : 5:2
    Saving event : 3:1
    Saving event : 2:1
    Saving event : 1:3
    Saving event : 5:3
    Sending to service : 1
    Saving event : 5:4
    Service returned : 1, Error
    Sending to service : 5
    Saving event : 2:2
    Service returned : 5, Error
    Saving event : 3:2
    Saving event : 4:3
    Saving event : 4:4
    Sending to service : 4
    Saving event : 2:3
    Service returned : 4, Error
    Saving event : 3:3
    Sending to service : 3
    Saving event : 2:4
    Reprocessing group : 1
    Reprocessing group : 5
    Reprocessing group : 4
    Service returned : 3, Error
    Sending to service : 2
    Reprocessing group : 3
    Service returned : 2, Finished
    Sending to service : 1
    *** Finalize : 2 - 1,2,3,4
    Service returned : 1, Finished
    Sending to service : 5
    *** Finalize : 1 - 1,2,3
    Service returned : 5, Finished
    Sending to service : 4
    *** Finalize : 5 - 1,2,3,4
    Service returned : 4, Finished
    Sending to service : 3
    *** Finalize  : 4 - 1,2,3,4
    Service returned : 3, Error
    Reprocessing group : 3
    Sending to service : 3
    Service returned : 3, Finished
    *** Finalize : 3 - 1,2,3