Learning

Teaching Thinking Patterns

How do you teach a thinking pattern? In my previous job I worked with a data analyst, who was really good at thinking about how our data correlates among itself. He was good at the data analysis, good at thinking up relationships, and good at coming up with complex comparisons.

However, he wrote some pretty funky SQL. We had him take a course in basics of MySQL, and from time to time I'd take a gander in the slow query logs and pick a few queries and point out the good and bad things. Most of these are optimization tips, such as "Indexes don't apply to columns when you're applying a function to that column".

But it's hard to try to think outside of your own box, and even running EXPLAIN on every query won't necessarily tell you how you can fix a query. The problem is that there's no way to know if you have optimized the query as much as possible. Even experienced DBAs

I was often frustrated when repeating concepts -- though I always took a different approach, since I find that if I have to repeat myself it means that it didn't sink in with the way I said it before. But I know many people, myself included, have managed to learn how to think differently.

The question, I guess, is how do you help someone else think differently?

The Real Reason Why Oracle Costs So Much

http://www.oracle.com/openworld/2007/appreciation.html

Now, Billy Joel is one of my all-time favorite pop musicians. I saw him in concert and nosebleed seats at the Boston Garden cost me USD $100 per ticket, and I bought 4 tickets (my twin brother is a die-hard Billy Joel fan, they were a holiday surprise 2 years ago!)

Billy Joel regularly sells out sports arenas. I can only imagine how much Oracle paid to have a concert with him.

And don't get me wrong, the rest of the list is also stellar. Which only adds to my disbelief.

MySQL shows customer appreciation by not grossly overcharging.

MySQL: Because you're smart enough to buy your own damn concert tickets.

Log Buffer #72 — a Carnival of the Vanities for DBAs

Welcome to the 72nd edition of Log Buffer, the weekly review of database blogs.

Oracle OpenWorld (OOW) is over, and Lucas Jellema of the AMIS Technology blog notes the OOW Content Catalog has been updated with most of the presentations available for download.

On his way home from OOW, Chris Muir of the appropriately titled One Size Doesn't Fit All blog notes how OOW and the Australian Oracle User Group Conference and OOW compare with regards to 99% fewer attendees in AUSOUG Perth conference - from 45k down to 350.

Mark Rittman of Rittman Mead Consulting summarizes OOW's impact on business intelligence and data warehousing in Reflections on Oracle's BI Strategy. On his way home, Mark found time for A First Look at Oracle OLAP 11g, noting the pros, cons, gotchas and suggestions for improvement for many useful new features.

Microsoft SQL Server also has a new release in the works. Ted Malone in Agile Methods for the DB Dev is excited about SQL Server 2008 "Katmai" CTP 5 New Features and descries almost 20 of them.

Ian Barwick of PostgreSQL Notes talks about Converting tsearch2 to 8.3 now that the tsearch2 full text search engine has been integrated as a core PostgreSQL feature.

Patrick Barel of the Bar Solutions Weblog explains a new feature of Oracle 11g called Virtual Columns. While virtual data may be a new topic, using databases on virtual machines is an ongoing issue. Marco Russo of SQL BI gives his opinion on when to use virtual machines in SQL Server Virtualization.

Database professionals can be real characters, and set in their ways. Bad puns make good transitions, and Corrado Pandiani sheds light on MySQL's rules for Charsets and Collations on Multicolumn Fulltext Indexes. Adam Douglas of Binary Expressions fixed some trouble with MySQL and French Characters not rendering properly.

Greg Sabino Mullane shows reasons for his Problems with pl/perl and UTF-8. In Tending the Garden, Selena Deckelmann goes through the very easy process of Automatic Character Set Conversion in PostgreSQL. Selena has also been busy organizing the development of ptop, an interactive, command-line tool for monitoring the current status of a PostgreSQL database. If you read this in time and are in the Portland, Oregon area you can join the ptop hackathon at noon (local time) tomorrow, Saturday November 24th, or you can read the ptop meeting summary from pdxpug.

While some of us are database tools, some of us prefer to contribute database tools. Baron Schwartz honors MySQL's trademark by announcing that MySQL Toolkit is now Ma'atkit. Ma'at, pronounced "mott", is the ancient Egyption patron saint of truth, harmony and order. In addition, Baron proclaims "Ma'atkit Version 1297 Released!"

Hubert Lubaczewski notes the changes to the analyze.pgsql.logs.pl script of pgsql-tools in update 3 and update 4.

Hubert also notes how to find overlapping time ranges and how to find the number of ranges a time belongs to in time ranges in postgresql - part 2. Though written for PostgreSQL, both posts can easily be applied to another DBMS. In the same vein, Yves Trudeau shares the DBMS-independent graphical images of Unix memory usage in Generating graphs from vmstat output.

Jeromy McMahon posts sample SQL code for viewing Oracle extent segments for tablespaces, temporary spaces and sort segment space. The Cheap DBA gets Oracle specific with a Slick Shell Script for Reporting on Oracle Workload. Krister Axel of codeboxer.com has A really clean dynamic insert proc for PL/SQL ETL packages, including validation checking and exception handling. zillablog's Robert Treat treats us to a function for tracking plperl shared variables.

Jen M is Keeping IT simple by coding capacity measurements to show How Not to Outgrow Your DB Infra: A Simple Step. She follows up with more code to monitor a specific cache to resolve unexplainable slowness/resource leak in SQL Server.

This post began with a conference, and so it shall conclude. The Call For Proposals for PgCon 2008 is underway, and David Fetter lets us know that PgCon 2008 will be held May 22-23 at the University of Ottawa. This is different from Joshua Drake's call for volunteers for Command Prompt's Postgresql Conference East 08, on March 28-29 at the University of Maryland. Neil Conway informs us of a Jim Gray Tribute, consisting of a general session and 9 half-hour technical sessions reviewing some of the 1998 Turing Award winner's work.

In case this edition did not give you enough to read, Beth Breidenbach of Confessions of a Database Geek created an aggregate blog feed for posts relating to information quality.

Top 10 MySQL Best Practices

So, O'Reilly's ONLamp.com has published the "Top 10 MySQL Best Practices" at http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html. Sadly, I find most "best practice" list do not thoroughly explain the "why" enough so that people can make their own decisions.

For instance, #3 is "Protect the MySQL installation directory from access by other users." I was intrigued at what they would consider the "installation" directory. By reading the tip, they actually mean the data directory. They say nothing of the log directory, nor that innodb data files may be in different places than the standard myisam data directories.

They perpetuate a myth in #4, "Don't store binary data in MySQL." What they really mean is "don't store large data in MySQL", which they go into in the tip. While it's true that there is very little benefit to having binary data in a database, they don't go into what those benefits are. This means that people can't make informed decisions, just "the best practice is this so I'm doing it."

The benefit of putting binary data in MySQL is to be able to associate metadata and other data. For instance, "user 200 owns file 483". If user 200 is gone from the system, how can you make sure file 483 is as well? There's no referential integrity unless it's in the database. While it's true that in most cases people would rather sacrifice the referential integrity for things like faster database backups and easier partitioning of large data objects, I believe in giving people full disclosure so they can make their own informed decision.

#5 is my biggest pet peeve. "Stick to ANSI SQL," with the goal being to be able to migrate to a different platform without having to rewrite the code. Does anyone tell Oracle folks not to use pl/sql like collections? Nobody says "SQL is a declarative language, pl/sql is procedural therefore you should never use it". How about SQL Server folks not to use transact-sql statements like WAITFOR? MATCH... AGAINST is not standard SQL, so I should never use it?

Now, of course, if you're selling a product to be run on different database platforms, then sure, you want to be platform agnostic. But you'd know that from the start. And if you have to migrate platforms you're going to have to do lots of work anyway, because there are third-party additions to all the software any way.

And why would *anyone* choose a specific database, and then *not* use those features? I think that it's a good tip to stick to ANSI SQL if you *know* you want to, or if you have no idea about the DBMS you're using.

If you want to see how this cripples MySQL, check out Visibone's SQL chart at: http://www.visibone.com/sql/chart_1200.jpg -- you can buy it here:

I would amend #6 by explaining how the industry standard is to use sequences, as well as data tables such as calendar tables, and that stored procedures should be built to deal with sequences. The problem with a coded solution is that it is difficult do the proper locking needed for sequences if more than one access is needed at a time. The reason I say stored procedures is that I believe that the database should handle the locking of tables, not the code, just as I believe referential integrity should be enforced at the database level.

Everything else is pretty good. I had a list a while back of "SQL Best Practices" here: http://sheeri.com/archives/104. I may post later on about my own personal MySQL Best Practices....

“kill” Oddness

So, at midnight I got a call from customer service saying our site was slow. I narrowed it down to one of our auxiliary databases, that seems to have gotten wedged just about midnight. Normal queries that took less than 4 seconds started taking longer and longer, moving up to 5 seconds and past 30 seconds in the span of a minute or so.

In the moment, I thought killing off all the queries would be a good move. My kill script, which consists of:


for i in `/usr/bin/mysql -u user -pPass -e 'show full processlist' | grep appuser | cut -f1`
do
mysql -u user -pPass -e "kill $i"
done

This will attempt to kill any mysql connection owned by the appuser. I used it a few times, and it didn't work. So I used a trick I learned when we bring our site down -- sometimes there are straggling connections to mysql, so what I do is change the app user's password by direct manipulation of the mysql.user table and flush privileges.

Within 10 seconds, all the connections from the appuser were gone, and when I put the correct password back and flushed privileges, everything came back normal. Queries started taking their usual amount of time.

Why is it that queries that refused to be killed by "kill", and yet changing the password for the user they were running as killed them off? Some were running more than 45 seconds, in various states of "Sending data" and "closing tables". Nothing was running for much longer than 60 seconds, so it doesn't seem like there was a big query that was wedging things.

Oh, and what happened at midnight to cause this? No clue, the only thing we run at that time is a PURGE LOGS FROM MASTER, which we do every hour, as we fill up a 1.1G binary log file every 20 minutes or so. This database holds a particularly heavy write application and also runs reports, so we optimize the tables every week (wednesday at 2 am). I've put the optimization to daily, as when I ran it manually this morning it took about 20 seconds.

Anyone have an idea about why changing the password worked so quickly when kill did not?

OurSQL Episode 19: MySQL Proxy

Direct play the episode at:
http://technocation.org/content/oursql-episode-19%3A-mysql-proxy-0

Feedback:
http://odeo.com/sendmeamessage/Sheeri

Call the comment line at +1 617-674-2369.
E-mail podcast@technocation.org

News:
MySQL Focuses on Japan
http://www.mysql.com/news-and-events/news/article_1368.html

MySQL Associate Certification Now Available
http://www.mysql.com/certification/

Learning resource:
Pythian Group's Carnival of the Vanities for the DBA community, published weekly on Fridays.
http://www.pythian.com/blogs/category/log-buffer/

Feature:
MySQL Proxy
Giuseppe Maxia's Blog:
http://datacharmer.blogspot.com

Getting Started with MySQL Proxy article plus tutorials:
http://www.oreillynet.com/pub/a/databases/2007/07/12/getting-started-with-mysql-proxy.html

public Subversion tree:

http://forge.mysql.com/wiki/mysql_proxy

Tutorials:
Intercept and dump queries (part 1): http://forge.mysql.com/snippets/view.php?id=75

Make macros to map "cd" to "use" and "ls" to "show tables" (part 2): http://forge.mysql.com/snippets/view.php?id=76

Injection Queries (part 3): http://forge.mysql.com/snippets/view.php?id=77

Lua interpreted language:
http://www.lua.org/

Ruby on Rails Presentation Video

aka, "Better late than never".....

Back in March 2007, the Boston MySQL User Group (http://mysql.meetup.com/137) watched and heard Brian DeLacey give a tutorial of Ruby on Rails, including its interaction with MySQL using ActiveRecord.

I knew absolutely nothing about Ruby on Rails before attending the presentation, other than Ruby was a language and people were saying that Rails made for easy development. After the presentation, I knew enough to start coding!

Brian is an excellent speaker, and this presentation is long overdue. (I'd tried creating the video before, during and after the MySQL Users Conference back in April, and my application kept crashing. I guess it just needed a break, because I fired it up today and it seemed to save the movie OK. Please let me know if you watch the presentation and something seems wrong).

Enjoy!

Direct download link:
http://www.technocation.org/movies/mysql/2007_03_Rails_BostonUG.wmv

Ruby on Rails by Brian DeLacey

When is a DBA not a DBA?

A sysadmin friend of mine was describing some DBA work he was doing, and wrote this:

I'm not much of a DBA, really - if it can't be done through phpMyAdmin I'm not likely to be doing it.

This is in stark contrast to so many so-called DBAs who say "I'm a DBA....If it can't be done through phpMyAdmin it must be Senior DBA work."

I've used phpMyAdmin for MySQL administration, and there's just something so nice about working on commandline.

A Note About the 12 Days of Scaleout

Some have pointed out that the 12 Days of Scaleout campaign is a "cheap marketing tactic."

Why, yes. It's inexpensive as far as campaigns go. It's definitely marketing. The grumbling seemed to be that there was no content on how the scaleout happened and worked for these companies.

We have to remember that not everyone is a geek. While we already know and love MySQL, there are people out there who only vaguely understand what a "database" is, much less have even heard of MySQL. Many laypeople I talk to haven't heard of Oracle!

MySQL needs this kind of marketing. Perhaps it better belongs as an advertisement in a glossy magazine, but I see no problem with MySQL using what they own -- lists, forums, PlanetMySQL, its own web page -- to do cheap marketing. In fact, "cheap marketing" is one of the main reasons for having a website! The Log Buffers are a great way to give back to the community, but they're also marketing for Pythian.

Heck, the people who post on Planet MySQL are marketing themselves -- which may seem like a silly statement until you realize that there are a few folks who are independent consultants.

So, yes. I'm all for Planet MySQL being cheap marketing. I market my podcasts on the 'planet, and I market my own skills. The next time I look for a job, I can point an employer to the fact that I'm at or near the top of the Planet MySQL top posters list.

Top 5 MySQL Community Wishes

As the 2007 Community Advocate of the Year, I'm taking the "MySQL 5 Wishes" meme and changing it a bit. I hope y'all don't mind:

1) Everyone has a different level of familiarity. The community does well with this when writing articles, for instance cross-referencing older articles, linking to documentation, the MySQL Forge, etc. Not much background information other than "MySQL usage" is assumed.

However, where we fall down is when we aggregate some writings and call it documentation. The worst form of this is a tool that grows organically, from "look, here's a script!" to a full-blown tool/patch/add-on. Sourceforge stinks for trying to make documentation, so most folks just link to their posts tagged "mytool" or whatever the name is.

Using some marketing skills would be wonderful -- make a page for folks who have never seen one post about it. Voila, you get your code going from something that people only learn when someone else tells them, to something folks wind up getting as a result of a search.

2) Along those lines, MySQL provides us with some great tools that we rarely use. When was the last time you linked your presentation to the MySQL Forge Wiki at http://forge.mysql.com/wiki/Main_Page? It took me a long time to make Technocation's MySQL 2007 Conference Video page at http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos -- Even after all the video was edited, I had to make the page.

How much easier would it have been if the descriptions, slides, handouts, video and audio were all available in one place? Obviously we can't hack on the O'Reilly site, but there's nothing to say that we can't make a wiki site with everything about a presentation in one place -- including links to everyone's notes! Make it so that 5 years from now a person learning MySQL can find what they need, when they don't have the same time/date context that we do.

3) Use (and appreciate) what we have. We have great software, sure. But we also have a company full of folks willing to talk to us. We can complain about the fact that even simple patches from non-employees take several months or a year or so to get into the code, because of existing coding conventions, etc. We can be annoyed that we have to download 7 addons for our software, but instead of saying MySQL should offer them for download in the same package (which of course they should, all the code should integrate nicely, and we should be able to turn on features we want and turn off or not use those we don't).......

....we can help that by making a centralized repository of MySQL addons. Run by the community, for the community. On the forge. At the very least we can make an index page of the neat tools we've created or found for MySQL and categorize them. Think of how plugins for software such as Firefox have repositories.

4) Volunteer unexpectedly. Got a presentation that didn't make the cut for the 2007 MySQL Users Conference? Offer to present it at a local user group. Don't have a local user group? Record the presentation as a lecture and post it online. Alternatively, make a local user group. Do what you're mostly comfortable with -- don't always stay in your comfort zone, push it a little. Maybe it means volunteering to help the MySQL documentation get a bit better. Contact someone you know in MySQL (or just put the word out in a blog post) that you'd like to help _________ get better, and you're sure to find a few takers.

5) Contribute! OK, many already do this at http://www.planetmysql.org. But consider contributing to:

Tidbits

In hindsight, it should have been obvious that open source software engineering would work because great people self-select, nothing is a trade secret, and the world is a very big place.
Gary Whizin, as interviewed by Matt Asay, http://planetmysql.org/entry.php?id=8058

How Much of a MySQL Geek Am I?

So, this is me:

Special thanx to Colin Charles for taking the picture and linking to it from his blog.

Notice that in addition to my photogenic qualities as well as the bags under my eyes, that I'm wearing an incredibly geeky necklace.

Yes, it's true. I bought a white gold dolphin to wear around my neck, because I am THAT much of a MySQL geek.

OurSQL Episode 17: Hashing it out

In this episode we tackle what a hash looks like in terms of a data structure, in preparation for next episode's discussion on the difference between hashes and btree indexes, and what kind of indexes are good for what kind of optimizations.

Show Notes:
Direct play this episode at:
http://technocation.org/content/oursql-episode-17%3A-hashing-it-out-0

Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql

News:
MySQL Connector/NET 5.1.1 released:
http://tinyurl.com/23a9ax

Download the new Connector/NET version:
http://dev.mysql.com/downloads/connector/net/5.1.html

MySQL 5.0.x security vulnerability:
http://bugs.mysql.com/bug.php?id=27513
Solution: upgrade to 5.0.40. This bug is not known to affect major versions 3 or 4.

Learning Resource:

http://onlinesolutionsmysql.blogspot.com/

The dates for the all the sessions:

* 27th March: Part 1 - High Availability and Scalability Architectures
* 19th April: Part 2 - Advanced Scalability Solutions
* 2nd May: Part 3 - MySQL Enterprise To Control Mission Critical Online Services
* 23rd May: Part 4 - 99.999% High Availability solutions
* 13th June: Part 5 - MySQL Enterprise performance and benchmarking
* 27th June: Part 6 - Advanced HA solutions

Find all the material and documentation for past webinars at:
http://onlinesolutionsmysql.blogspot.com/2007/03/links-to-material-and-documentation.html

Feature: Hash tables explained.

http://www.sparknotes.com/cs/searching/hashtables/section1.html

http://www.cs.sunysb.edu/~algorith/lectures-good/node7.html (search for "Hash Tables" on the page)

Feedback:

Email podcast@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum

Part 2: Data Warehousing Tips and Tricks

Ask and you shall receive: http://face.centosprime.com/rdb-w/?p=68 linked to my previous post on the Data Warehousing Tips and Tricks session (http://sheeri.net/archives/204) with the comment, "I need to learn more about MERGE TABLES and INSERT … ON DUPLICATE KEY UPDATE".

So here's a bit more:

The manual pages for the MERGE storage engine:
http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
and
http://dev.mysql.com/doc/refman/5.0/en/merge-table-problems.html

MySQL Forums for the MERGE talbe are at:
http://forums.mysql.com/list.php?93

In a nutshell, a MERGE table is really a set of pointers to similarly-schema'd MyISAM tables. So if you have the same table schema multiple times (ie, partition per day, so you have tables named 2007_04_27_Sales, 2007_04_26_Sales, etc) you'd use a MERGE table to link them all together and then you can run a query on the MERGE table and it will query all the tables that the MERGE table points to.

As for INSERT . . . ON DUPLICATE KEY UPDATE --

MySQL gives many ways to deal with INSERTs and unique/primary keys. If you do an INSERT and the primary key you are trying to insert is already in the table, MySQL will give an error. Ways to deal with this:

1) Try & catch errors in the application code.

2) Use INSERT IGNORE INTO . . . this will insert a new record if a record with the key does not exist. If it does exist, nothing happens. Simply add the word "IGNORE" into your INSERT query after INSERT and before INTO.

3) Use REPLACE INTO . . .this will insert a new record if a record with the key does not exist. If a record does exist, MySQL will *delete* the record and then INSERT your record. This can cause problems when you just want to update part of a row, and not insert the whole row again. And it changes timestamps and auto-increment numbers, which may not be a desired result. Simply change the word "INSERT" in your query to "REPLACE".

4) Use INSERT . . .ON DUPLICATE KEY UPDATE. The syntax is the regular INSERT statement, and at the end add ON DUPLICATE KEY UPDATE [expression]. For instance,

INSERT INTO tbl (id,name,thing) VALUES (154,'sheeri','book') ON DUPLICATE KEY UPDATE thing='book';

and what makes it easier, if you have variables or whatever in your VALUES, you can actually set the update statement to say "just use the value I wanted to insert, OK?" as in the following:

INSERT INTO tbl (id,name,thing) VALUES (154,'sheeri','book') ON DUPLICATE KEY UPDATE thing=VALUES(thing);

Manual page:
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Hope this helps!

Data Warehousing Tips and Tricks

It's not easy to do a DW in MySQL -- but it's not impossible either. Easier to go to Teradata than to write your own.

DW characteristics:

1) Organic, evolves over time from OLTP systems -- issues, locking, large queries, # of userss.

2) Starts as a copy of OLTP, but changes over time -- schema evolution, replication lag, duplicate data issues

3) Custom -- designed from the ground up for DW -- issues with getting it started, growth, aggregations, backup.

4) How do you update the data in the warehouse? -- write/update/read/delete, write/read/delete, or write only -- which means that roll out requires partitions or merge tables.

The secret to DW is partitioning -- can be based on:
data -- date, groups like department, company, etc.
functional -- sales, HR, etc.
random -- hash, mod on a primary key.

You can partition:
manually -- unions, application logic, etc.
using MERGE tables and MyISAM
MySQL 5.1 using partitions

You can load, backup and purge by partition, so perhaps keeping that logic intact -- if it takes too much work to load a partition because you've grouped it oddly, then your partitioning schema isn't so great.

Make sure your partitioning is flexible -- you need to plan for growth from day 1. So don't just partition once and forget about it, make sure you can change the partitioning schema without too much trouble. Hash and modulo partitioning aren't very flexible, and you have to restructure your data to do so.

Use MyISAM for data warehousing -- 3-4 times faster than InnoDB, data 2-3 times smaller, MyISAM table files can be easily copied from one server to another, MERGE tables available only over MyISAM tables (scans are 10-15% faster with merge tables), and you can make read-only tables (compressed with indexes) to reduce data size further. ie, compress older data (a year ago, or a week ago if it doesn't change!)

Issues for using MyISAM for DW -- Table locking for high volumes of real-time data (concurrent inserts are allowed when there is ONLY insertions going on, not deletions). This is where partitioning comes in! REPAIR TABLE also takes a long time -- better to backup frequently, saving tables, loadset and logs, and then instead of REPAIR TABLE do a point-in-time recovery. For write-only DW, save your write-loads and use that as part of your backup strategy.

Deletes will break concurrent inserts -- delayed inserts still work, but they're not as efficient. You also have to program that in, you can't, say, replicate using INSERT DELAYED where the master had INSERT.

[Baron's idea -- take current data in InnoDB format, and UNION over other DW tables]

No index clustering for queries that need it -- OPTIMIZE TABLE will fix this but it can take a long time to run.

When to use InnoDB -- if you must have a high volume of realtime loads -- InnoDB record locking is better.

If ALL of your queries can take advantage of index clustering -- most or all queries access the data using the primary key (bec. all indexes are clustered together with the primary key, so non-primary key lookups are much faster than regular non-primary key lookups in MySIAM). BUT this means you want to keep your primary keys small. Plus, the more indexes you have, the slower your inserts are, and moreso because of the clustering.

MEMORY storage engine: Use it when you have smaller tables that aren't updated very often; they're faster and support hash indexes, which are better for doing single record lookups.

Store the data for the MEMORY engine twice, once in the MEMORY table and once in MyISAM or InnoDB, add queries to the MySQL init script to copy the data from the disk tables to the MEMORY tables upon restart using --init-file=< file name >

ARCHIVE storage engine -- use to store older data. More compression than compressed MyISAM, fast inserts, 5.1 supports limited indexes, good performance for full table scans.

Nitro Storage Engine -- very high INSERT rates w/ simultaneous queries. Ultra high performance on aggregate operations on index values. Doesn't require 64-bit server, runs well on 32-bit machines. High performance scans on temporal data, can use partial indexes in ways other engines can't. http://www.nitrosecurity.com

InfoBright Storage Engine -- best compression of all storage engines -- 10:1 compression, peak can be as high as 30:1 -- includes equivalent of indexes for complex analysis queries. High batch load rates -- up to 65GB per hour! Right now it's Windows only, Linux and other to come. Very good performance for analysis type queries, even working with >5TB data. http://www.infobright.com

Backup -- For small tables, just back up. Best option for large tables is copying the data files. If you have a write-only/roll out DB you only need to copy the newly added tables. So you don't need to keep backing up the same data, just backup the new stuff. Or, just save the load sets. Just backup what changes, and partition smartly.

Tips:
Use INSERT . . . ON DUPLICATE KEY UPDATE to build aggregate tables, when the tables are very large and sorts go to disk, or when you need it real time.

Emulating Star Schema Optimization & Hash Joins -- MySQL doesn't do these, except MEMORY tables can use has indexes. So use a MEMORY INDEX table and optimizer hints to manually do a star schema optimized hash join. Steps:

1) Create a query to filter the fact table
to select all sales from week 1-5 and display by region & store type:

SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5;

Access only the tables you need for filtering the data, but select the foreign key ID's.

2) Join the result from step 1 with other facts/tables needed for the report

(SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5) AS R
INNER JOIN location AS L ON (L.locationID=R.locationID) INNER JOIN store AS S ON (S.storeId=R.storeId);

3) Aggregate the results

(SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5) AS R
INNER JOIN location AS L ON (L.locationID=R.locationID) INNER JOIN store AS S ON (S.storeId=R.storeId)
GROUP BY week, region, store_type;

Critical configuration options for DW -- sort_buffer_size -- used to do SELECT DISTINCT, GROUP BY, ORDER BY, UNION DISTINCT (or just UNION)

Watch the value of sort_merge_passes (more than 1 per second or 4-5 per minute) to see if you need to increase sort_buffer_size. sort_buffer_size is a PER-CONNECTION parameter, so don't be too too greedy.....but it can also be increased dynamically before running a large query, and reduced afterwards.

key_buffer_size - use multiplekey buffer caches. Use difference caches for hot, warm & cold indexes. Preload your key caches at server startup. Try to use 1/4 of memory (up to 4G per key_buffer) for your total key buffer space. Monitor the cache hit rate by watching:

Read hit rate = key_reads/key_read_requests
Write hit rate = key_writes/key_write_requests
Key_reads & key_writes per second are also important.

hot_cache.key_buffer_size = 1G
fred.key_buffer_size = 1G
fred.key_cache_division_limit = 80
key_cache_size = 2G
key_cache_division_limit = 60
init-file = my_init_file.sql

in the init file:

CACHE INDEX T1,T2,T3 INDEX (I1,I2) INTO hot_cache;
CACHE INDEX T4,T5,T3 INDEX (I3,I4) INTO fred;
LOAD INDEX INTO CACHE T1,T3 NO LEAVES; -- use when cache isn't big enough to hold the whole index.
LOAD INDEX INTO CACHE T10, T11, T2, T4, T5

http://dev.mysql.com/doc/refman/5.0/en/myisam-key-cache.html

This was implemented in MySQL 4.1.1

Temporary table sizes -- monitor created_disk_tmp_tables -- more than a few per minute is bad, one a minute could be bad depending on the query. tmp tables start in memory and then go to disk...increase tmp_table_size and max_heap_table_size -- can by done by session, for queries that need >64MB or so of space.

ALWAYS turn on the slow query log! save them for a few logs, use mysqldumpslow to analyze queries daily. Best to have an automated script to run mysqldumpslow and e-mail a report with the 10-25 worst queries.

log_queries_not_using_indexes unless your DW is designed to use explicit full-table scans.

Learn what the explain plan output means & how the optimizer works:
http://forge.mysql.com/wiki/MySQL_Internals_Optimizer

Other key status variables to watch
select_scan -- full scan of first table
select_full_join -- # of joins doing full table scan 'cause not using indexes
sort_scan -- # of sorts that require
table_locks_waited
uptime

mysqladmin extended:
mysqladmin -u user -ppasswd ex =i60 -r | tee states.log | grep -v '0'

(runs every 60 seconds, display only status variables that have changed, logs full status to stats.log every 60 seconds).