Friday, August 26, 2011

Tales from the wild - isolation, incompetence, and database requirements

I worked on my own or a led a team of less-experienced programmers throughout all of my professional experience. As a result, I am sorely missing guidance and help from other developers. While I normally have the confidence (arrogance!) to plow ahead with my own solutions, I am increasingly aware of my own incompetence, which goes unchecked without having a more senior - or at least equal - developer to pull me back into reality. For instance, here's an example of a problem I faced several months ago:

We needed to manage a list (>500 and growing) of events from a web app. Each event has a title, other general information, and a list of one or more date/place pairs. Each date/place pair has a vector of numerical info about the event. Users need a CRUD to edit all this, and it needs to keep track of each change. E.g.; a user needs to change a date without loosing the associated place and vectors, and it needs to log the change; or a user needs to change the numerical info associated to a specific event/place, et cetera. The user needs to do this from an ajaxy grid that displays the most recent version of the information (the user doesn't need to see revision history in this grid.)

How can we store this data in a way that we can retrieve it quickly? Ideally, I'd like to request all the data needed for the grid in a single SQL query. Of course, my language used in describing the problem puts many assumptions on a data storage solution already. Maybe I shouldn't be using SQL. Maybe a list of pairs isn't optimal.

My Original Solution



I wrote this solution 6+ months ago and am now embarrassed by it:
I have four MySQL tables: Events - id, titles, et cetera; EventDateBridge - a map between Event_id's and id's for date/time pairs (EventDateBridge_id); EventDates - revision history for date/place pairs with EventDateBridge_id, date, place, and time added; EventVectors - revision history for vectors associated to date/place pairs with EventDateBridge_id, time added, and the vector. To update an Event we just add a row to the EventVectors or EvenDates table. To build the grid: We need one query to grab all events. For each event we need another query to grab all EventDateBridge_id's on that event/place pair. Finally, we need two more queries - one to grab date/place and another to grab the vector.

Obvious problems: lots of "order by date_added desc limit 1" to get most recent revision, lots of sub-queries, several queries needed per Event to build the grid - this results in several thousand queries!

A Better Solution



A more sober programmer (myself, months later when returning to it) recommended this:
Three MySQL tables: Events as before; RecentEventDates - date_id, event_id, date/place pair and vector; EventDateHistory - date_id, event_id, date_added, date/place pair and vector. To update an Event, we update the RecentEventDates and add a row to EventDateHistory. To build the grid we run one a query on RecentEventDates (using a simple join to grab related event information). We can use our programming language to reassemble the result into a list of events, each event having a list of date/place pairs (otherwise, the result is a list of vectors including event info, date/place pairs, and the numerical vector.)

Am I missing some better solution? Am I missing a premade solution?