I think the problem is that CAT is not act like a true 24/7 application.
It's a 2 main issues here:
- CAT can't handle problems by itself and don't do restart/reconnection when error occurs.
So, if the problem with exchange - instead of detect it, you just talk that problem with exchange, and user need to restart cat, do some things, etc.
Okay, problem with exchange, you have some internal implementation of your software, but if you positioning your software like a 24/7 you can't just talk: "it's a exchange problem it's a not a cat". In fact you can, but any other words from your side about 24/7 it's a lie.
- CAT can't automatically restore after fail. For example users can't do cron/auto load with system and just leave system with CAT for 24/7. At any start of CAT you need to login on every exchange, click to bunch of dialogs during loading market and parameters, etc. This amount of manual work is not suppose that user work with 24/7 application.
I don't know what kind of software you are write on your work additionally to CAT, and if you have experience with real 24/7 application.
But for reference you could see how the video surveillance system works, how the POS/ATM works, how software at automotive area works. It's a exact good example of how your software should ack as a 24/7 system.