To clarify you're trying to build models from the data but want stuff to be synchronised between different datasets.
I may not have gone further than just a fairly Timeframe but is it not possible to just aggregate the data to a point where both sets are in sync rather than trying to fill gaps with models etc...
Could you give an example of the stuff you're trying to do or a simple application others would use that links to yours - without giving away what you're trading off..
I've encountered 2 types of problems so far. First is that the timestamp syncronizing is really difficult. Not all events have time stamps based on when they happened in the market, so I'm not sure how to sync them. As an example, imagine that I receive the events from two markets (like order book updates) and want to see which market moves first. The problem is that one of the markets does not give a timestamp on when the event was registered in the trading engine, but only timestamp I have is the one I record in my server. I don't know how long the data is in transit from the exchanges server, so it becomes really difficult for me to estimate which markets orders were recorded first. Any suggestions what can I do?
Second problem is with the gaps. Some data sets (for example's sake, imagine candles) can have a couple of days long gap in them. Most sources are telling me to just average it out but I don't like that as I'm worried it influences the models. Also filling the gaps from another data source works rarely because usually the timestamps are not in sync so it becomes almost impossible to fit them retrospectively.
I'm now using only a couple of exchanges and a handful of pairs. I'd like to increase the amount of markets but I'm worried about what kind of issues there will be. I'm already spending so much time in fixing these things that I don't know if I can manage more markets. Any ideas/help will be really appreciated!