didn't notice that - just learned of storj.
That article tho - you move directly from storage pricing to compare storage pricing against transfer pricing? What?!?
Now that doesn't make - since when did transit capacity become storage capacity?
Apples to Oranges.
Now, the going rate per TiB per month at lowest levels on current dedicated market is 7.5€ for low end servers, and around 6.2€ for very large nodes.
Hence, 100Gb costs a month somewhere around the average of 7.25€ when using the current most cost effective offers out there.
and for someone like me who has his own DC... Well, it's on the CAPEX and not OPEX side of the sheet, but turned to OPEX... Well, it's a fraction of that
Depends on what kind of deals you got, what bulk pricing you get, do you outright buy it or do you lease or rent it, what level of hardware are you utilizing etc. If you buy really big, and seriously skimp on the quality level of hardware we are talking about under 1.5€/TiB/Mo potentially, plus bandwidth fees.
So approximately 0.15€ for 100GiB/Mo with current HDD pricing at cost level, at high efficiency operation.
So, on a system like this, we could potentially see pricing to fall to about quadruple to that fairly easily, when the storage providers start to outweigh user base, we will see it to drop around just 25-30% markup.
If average lifetime of data is 1year, and it is read 4 times during this time, we have 3months timespan, hence with 1Gbps you can host about 790TiB. 1Gbps connected server never really achieves more than 85-95% of the maximum, infact, no link can do faster, since usually about 5% is spent on error correction.
The true ratios of read, cold data timespans etc. will only be revealed on production, it's totally impossible to predict.
What kind of erasure coding is in the plans?
Further, deduplication on GRAND scale would be worth it on something like this, on blocks/files larger than 1000Mb or something like that, then the hash tables will stay somewhat sanely sized. but that can be worked in later on... Just saying, deduplication along with erasure coding in this kind of system are not only just a GREAT idea, but almost necessary. Further, the dedup count could increase redundancy factor for that data as well, dynamically making data which is of interest to more people, more resilient to failures, and the cost could be shared by all those users who wanted to store that data, thus driving the cost dramatically down for those putting in data others have put in as well.
Those dedup tables will still consume INSANE amounts of storage - but hey, that's what's abundantly available
It would still work rather fast with nicely optimized lookup trees, split those hashes up!
a single look up would consume only a couple of I/Os, needs some careful thought to see the quantity of I/Os and to minimize them.
Just because it's cloud, resources ought not to be wasted
EDIT: Oh yeah, and i'm sorry to say, but reading that article made the impression that the writer doesn't know sh** about industrial scale computing, at least explain the reason why comparing apples to oranges, and assumptions that 1Gbps can actually do 1Gbps tells writer had no idea of network technology. In all my 15 or so years in hosting industry i've probably never seen above 120M/s for 1Gig link, and over internet never above 116M/s, and even that is a freakishly rare occurence, average node, average network seems to stall at 95-105M/s commonly.