Please share good ideas implemented poorly you have seen.
For example, in the past 24 hours I have come across 2 such beasts:
1) A company doing vertical scaling based on username. Unsurprisingly, they found that 70% of their users had usernames beginning with 1 of 2 letters. A better idea? Do it by last 2 digits of a user id. That way, even though users are created and perhaps deleted all the time, the distribution is effectively random. Language is not.
With 2 significant figures, you can divide vertically many different ways -- 1 large table of all the uids, 2 tables of uids ending in 00-49 and 50-99, 4 tables, 10 tables, 25 tables, 50 tables, 100 tables. This eliminates the need for a separate table for each user.
For example, the online dating site I work for uses this method for messaging. So a user's inbox is just one part of a table. If the user has uid 12345, their "Inbox" is the result of SELECT * FROM Inbox_45 WHERE toUid=12345. Their "Sent" box is the result of SELECT * FROM Inbox_45 WHERE fromUid=12345. We have over 650,000 active users; I can only imagine what horrors we would have if we had 1 table for each person's mailbox (much less 1 each for Trash, Sent, Saved and Inbox).
This also helps when using InnoDB tables, as we are -- the count(*) query to find the number of messages in your boxes uses the indexes on toUid and fromUid on the tables. 1 table per uid would not take advantage of this index, although it would probably be a bit faster in terms of retrieval. We kept this idea in mind -- if we had one table per uid, we probably could use MyISAM tables, and counts would be faster. But again, horrors with 650,000 tables!
Also, you want to keep your code as flexible as possible. Do not hard code which servers have which tables -- use SHOW TABLES in scripts and such. We truncate trashes nightly, and simply have a loop that says:
(pseudocode)
for i in "show tables"
do
truncate Trash_$i
done
2) The lead developer on a new project had the rule of "each table has a class", which led another developer to come to me asking to resolve a problem. You guessed it -- tables in different databases that have the same name. We have a static database, with a table called "Preferences". This is a joining table, that says "preference #1 is always blah". And in our profiles database for a site, we also have a "Preferences" table -- this is where we associate users with their preferences.
The goal is world domination to be able to have multiple sites use similar codebases and data schema. Therefore, it is perfectly reasonable to have databases with identical schemas and different names. Site1 can have an identical schema and codebase to Site2, with storing the data in the "Site1" and "Site2" databases. The reason to have 2 separate databases is that we want to be avoid having to put which site the data refers to as a part of each row.
Any other examples of good ideas implemented poorly? Feel free to make an entry, but if it's not on PlanetMySQL.org please put a link in the comments so I'm sure to actually read it.
Followers may note I fell way short of my goal of 1 post per day in May; in fact, I had 15 posts in May, not the 31 I had hoped. I'm OK with that, and I apologize for the long radio silence. In the meantime, I had a wonderful few weeks, including biking on the island of Martha's Vineyard, and a wonderful phone conversation with Frank Mash last night.
If anyone cares to know, my Skype name is awfief. I'm happy to talk to folks; I'm GMT -5 (Eastern Standard Time, Daylight Savings when appropriate). I also have a tendency to leave the program on while I'm in the room, so feel free to ring (instead of Skype Chat) to see if I'm around.
Yes, some people don't
Yes, some people don't consider the data demographics well in applications. Always pay to do some data analysis first when enhancing an existing system, but when designing a new system it pays to use a user data independent means of splitting data as per your example of last 2 digits, hadn't tried that approach before.
Hmmm, writing a blog entry every day, it's very hard isn't it. I tried myself without success but did succeed at 30 days straight. Jan 2006 and Feb 2006.
Keep trying!
Of all the regular PlanetMySQL bloggers, I guess you were the person I interacted with the least at the UC, but at least we do share a infamous bond in the future of MySQL. Hopefully we can all skype conference with the usual suspects at some later date.
Hi Sheeri, UC seems a long
Hi Sheeri,
UC seems a long time ago! I'm very sorry but my skype does not seem to be able to find: awfief
Thank you very much Sheeri
Thank you very much Sheeri for an excellent chat yesterday night. You were a great help in generously discussing the best practices with me. I really appreciate it.
Thanks
Frank