aka....."when good queries go bad!"
So, today the developers were debugging why a script was running much longer than expected. They were doing text database inserts, and got to the point where they realized that double the amount of text meant the queries took double the amount of time.
You see, they were doing similar text inserts over and over, instead of using connection pooling and/or batching them. Apparently the other DBA explained that it was a limitation of MySQL, but either the developers didn't convey what they were doing well, or the DBA didn't think to mention batching.
I ran a simple test on a test server. I used the commandline to connect to a db server on the same machine (even though in qa and production the db machine is on a different machine) just to make a point:
| Type | Connects | Queries | Length of data transmitted | Time |
|---|---|---|---|---|
| One-off | 1000 | 1 | 619 bytes | 12.232s |
| Single Connection | 1 | 1000 | 604 kilobytes | 0.268s |
| Batch | 1 | 1 | 517 kilobytes | 0.135s |
So 1000 INSERTs using 1 connection is over 45 times faster than 1000 INSERTs using 1000 connections.
Using 1 batch INSERT statement is over 1.75 times faster than using 1 connection.
Using 1 batch INSERT statement is over 90 times faster than 1000 INSERTs using 1000 connections.
Note that while it's faster to send a batch, if you don't support sending 517 kilobytes to your database at once, you'll want to break it up. That's a small coding price to pay for 90x the database performance!!!
For reference, the formats used:
One-off:
INSERT INTO foo (col1, col2...) VALUES (val1, val2...);
Single Connection:
INSERT INTO foo (col1, col2...) VALUES (val1, val2...);
INSERT INTO foo (col1, col2...) VALUES (val1a, val2a...);
Batch: INSERT INTO foo (col1, col2...) VALUES (val1, val2...), (val1a, val2a);
Scott -- yep, you're right.
Scott -- yep, you're right. In this case, though, it's an application using live data, just a lot of similar data over and over, so LOAD DATA INFILE is not appropriate.
Bill -- Having founded Microsoft, I'm sure you see that it's obvious. But those of us who aren't multi-billionaires don't necessarily know all the options, and read other people's blogs to learn more.
Peter -- you're right. The box was a sandbox and other programs were running; I don't test on production machines. It's possible that was an issue too.
George -- yes and no. I think developers are also taught not to make very large queries, and that shorter queries take less time, etc. It's not that you should always batch things, nor that you should never do it. But you need to think and test.
Good post. Don't know the
Good post.
Don't know the environment but developers are usually taught how to write programs. Java developers know Java, PHP developers know PHP, etc. A lot of developers (I know I'll get pinged for this) do not understand how databases work or their architectures, so they often write inefficient code.
Sheri, Something looks
Sheri,
Something looks wrong - just 100 connects/sec for localhost is way to low.
Though if you go via TCP/IP this is binary compiled with tcp-wrappers which have to resolve IP via reverse DNS for each connect it would be expected.
But in general you're right of course. Connecting for each query is VERY bad.
This is rather obvious isn't
This is rather obvious isn't it? It goes hand in hand with SQL 101.
And LOAD DATA INFILE?
And LOAD DATA INFILE?