A different way of visualizing SQL databases

A friend asked me this weekend to help him understand SQL databases for a course in which he’s enrolled. The general structure of a relational database seemed straightforward to him – creating a new row for each entry, linking one-to-one relationships directly, and creating a gerund for one-to-many and many-to-many relationships. The question was really one of “when should I write data to the database?” Some would argue that the answer is to always write data. I’m not of this camp. Writing to a database should only be done in two primary cases: where the data cannot be found by querying and performing calculations on other fields, and where the frequency of the more complex query is so great that it has an effect on performance.

When compared to reading from a database, writing comes with a high computational cost. It simply does not make sense to write everything. Since the above criteria are sometimes difficult to visualize, I had to find a better approach to convey it. I was inspired by an old example used in computer science to describe memory allocation. In the example, an operator (or robot) has pieces of information (mail) and sorts them in to pigeonholes and remembers the address of the information for retrieval. This helps some students visualize the allocation of memory.

My second inspiration was an article on early childhood education and the best methods to teach multiplication tables. It was then that I had the “A-ha!” moment, and was able to formulate the necessary example. Why is it that we memorize multiplication tables? It’s because the benefit gained from being able to instantly recall that 6×7=42 is much greater than the the benefit gained from calculating it at each turn. In order to memorize relatively simple calculations, we must repeat them over and over – equivalent to the computational cost of writing to an SQL database.

The correlation then is simple: we should only memorize something if there’s no way of retrieving the information again when it’s actually needed, or if we must recall it so frequently that repeating the steps to get to the information each time would be detrimental to our performance. The same is definitely true for databases: minimize the information written to improve your overall performance, but write when calculations would slow your performance.

I hope this analogy will help at least one other person visualize when we should write and when we should simply read and calculate.