Author

Topic: Random thought: Logging as a key design principle in relational databases (Read 187 times)

administrator
Activity: 5222
Merit: 13032
A very old idea of mine that I've never come to implement has been for an online forum to keep a record of all prior versions of an edited post.

Epochtalk works like that. I think that currently the edit logs are all public, though users may end up being too uncomfortable with that.
qwk
donator
Activity: 3542
Merit: 3413
Shitcoin Minimalist
A very old idea of mine that I've never come to implement has been for an online forum to keep a record of all prior versions of an edited post.
I.e. you'd have a table with post_ids and another table where you have one entry for each version of the post.
In a view, you'd simply always display that as one table, of course (which would possibly make it backwards compatible with existing forum software).
To save some space, old versions of posts could simply be diffs from the latest version (or the first version, whichever you prefer).
No more deleting posts, only superseding them with new versions, admins & maybe privileged users like moderators always being able to see what you edited in or out.

Of course, that's totally against GDPR laws' "right to be forgotten" Wink
administrator
Activity: 5222
Merit: 13032
I've dealt with relational databases for well over 10 years. One thing which I've realized in the past few years but which I've never really seen mentioned in any of the books or guides about this (though maybe I've just missed it) is the following principle of effective relational database design: You should always start designing a relational database schema from the perspective of logs. For example, instead of having an "email_address" column in a users table which gets updated from time to time, you should have a table like email_log (user_id, time, email_address), and the user's current email address will be their newest entry in that log.

On countless occasions I or someone else had not done this, and I'd regretted it. Structuring things in this log-based way:
 - Promotes application-level database consistency, since you can't as easily update data outside of the intended contract.
 - Gives you access to a more complete picture of each piece of data, which you often want later.
 - Makes normalization the default.
 - Starts you off on a good footing performance-wise, since insertions into a log table are usually near-free and non-locking. Because two processes never need to write the same data at the same time, contention is kept to a minimum. Locking/contention is in practice the biggest performance issue for many applications, such as websites.

Now, sometimes it's too annoying to always do this. If your idea of a "user" is the end result of a dozen or more logs, then it may be difficult or performance-poor to perform more complex queries on the data. In this case, you can create more traditional "caching" tables using triggers on the log tables (or at the application level, or by using DBMS support for materialized views). Since these tables are only derived from the real data, in some cases it may be acceptable to do this on a "best-effort" basis, for example by only updating the caching tables occasionally or by using low levels of transaction isolation when dealing with the caching tables.

It's fairly common I think to do the reverse of the above paragraph, where you have triggers on the main tables which fill up log tables, but I don't regard this as ideal because it removes most of the advantages I listed earlier.

If logs become too large to be performant or convenient, it often requires no changes to the application to just delete all non-latest log entries older than .
Jump to: