What Does Transparency Mean?

One of the things I love about making art is that it often inspires me to make connections I had not made before. I’m currently working my way through Julie Balzer‘s A Year of Gelatin Printing class, and there’s a month called “Layer Layer Layer” that includes a lesson on transparency.

The first technique was simple, as the outcome was not meant to be the artwork. In order to assess whether or not a particular tube of paint was transparent, the lesson was to grab a book page* and apply paint to it, to see whether or not the paint is opaque, semi-transparent, or transparent.

It is not a particularly challenging lesson, which gave me time to think about when I might want different levels of transparency, what kinds of layering I might do.

a book page. The middle has no paint, the top has a layer of transparent orange, the bottom has a layer of semi-transparent red.
Transparent at the top, semi-transparent at the bottom

Suddenly my brain leapt to a realization – transparency in a business context is often stated as a binary. I’m a huge fan of smashing binaries to get the rich matrix that lies beneath, and this was no different.

Transparency can mean:

  • Decisions, and reasons for making those decisions, are talked about openly.
  • Being open and honest about bad news, a project going off track, or a timeline change, instead of covering these up.
  • Recognizing those who supported your effort, helped with research, or lent their eyes and opinions to make your work better.
  • Sharing your roadmap publicly, and sharing any changes publicly
  • Agreeing with a coworker or customer who has complaints about your product
  • Being forthcoming about product or team limitations

Of course, transparency can be taken too far. There are plenty of reasons for privacy – discussions with HR (including hiring, retention and losing employees), not telling employees about a big change (e.g. merger) until you know it’s going to happen so uncertainly doesn’t spread, and of course personal details such as government ID numbers, bank information for direct deposits, and authentication credentials.

Transparency is not simply a matter of making information available, though. Good transparency is not just a permissive shared folder or wiki; good transparency ensures that people are well-informed.

For example: let’s say the 50-page employee handbook changes. Transparency is on a spectrum, with one end being an oversimplified notification:

Please note, the employee handbook has changed for US employees.

This is the equivalent of making everything public and letting people sort out for themselves not only which information is significant, but what, if any, changes there have been to call awareness to. It’s not particularly useful for transparency – it’s opaque, like the top section of black paint on this book page:

a book page, with an opaque thick black strip at the top, the middle has no paint, and the bottom is semi-transparent through brown paint
Opaque at the top, semi-transparent at the bottom

Semi-transparency, such as the brown paint at the bottom of the page in the image above, might be to explain what has changed:

Please note, the employee handbook for US employees has been changed to reflect the following policy changes:

– “Hairstyle” and “hair texture” are now explicitly named as protected categories under discrimination, harrassment, and equal opportunity employment.

– There is a new section explaining the ban on firearms at the workplace.

Also known as, “Please don’t make me read 50 pages to figure out what changed!”

I pointed out that this might be semi-transparent, as it depends on how it’s done, and if it accurately portrays important changes and downplays unimportant changes. Of course, what is important can be different for different departments, teams and people at an organization, so this can be a bit trickly.

Similarly semi-transparent is highlight what changed – this method calls out exactly what changed, like a version history or file diff. If there are very few changes, this can illuminate what changed better than a description of the changes. However, if there are a lot of changes, a summary might be better.

What would the most useful form of transparency be? Well, it depends on the recipient as well. I would prefer a mixture of the three, if there are a lot of changes; if there are a few significant changes, highlighting would suffice; and if there are many repetitive changes, an announcement is fine – e.g. “all instances of dog have been changed to pet”.

I’m sure we all have stories of how too much transparency caused issues – not being able to find a document or conversation because all docs and conversations are public.

* I have an old copy of Head First HTML 5 Programming I use for painting on book pages

It’s the Little Things

More than once, I have said that one of my KPIs for my performance reviews should be “hours of meetings my manager [or coworker] no longer needs to go to.” Usually when I come onboard, I am helping ease a tough workload, and it makes me happy to be able to help.

There are many small ways I try to make the lives of my coworkers easier. Many of these items save more time for my coworker, than they take for me to do. And I believe they are good practice overall.

Send a Link

If I could wave a magic wand and have ONE of these items always happen, it would be this one: SEND A LINK.

So often, I get an email or chat message that refers to an existing document or presentation. If the message does not contain a link to the presentation…in the best case scenario, I type a key phrase into my address bar and my browser’s smart search finds it. In a slightly worse scenario, I cannot locate it easily and have to go digital spelunking.

In the worst case scenario, I have no idea which document or presentation is being referred to, and I have to either figure it out, or ask for a link.

If the message contains a link, I can click it and be there. This saves a lot of time because usually when you’re messaging about something, you are looking at it or recently have looked at it. The link is readily at hand, in your paste buffer, or in your clipboard history (if you’re on a system without a built-in clipboard history, find an app to get one, they’re super handy!).

Give Appropriate Context

Giving a link is a subset of giving appropriate context. This can be difficult to get right; however, there are some bits of context that are usually appropriate. In fact, the previous point – “Send a Link” – is about giving context!

Now, this is something I suffer from on both sides, as I am both an over-explainer and have a terrible memory, and often need a bit more context than people provide. Here’s an example:

Hi, here’s the file we talked about in the meeting we had.

I have a lot of meetings every day, and a terrible memory. I’m really good at writing down “Follow up: X promised me Y report”, and with this I might have to go to my notes if I didn’t remember. However this, only slightly longer, gives me the details I need:

Hi, here’s the report on lineage usage that we talked about in the KPI meeting we had.

Here’s another example. I was going to send our beta program lead a Slack message like this:

Hi! I’m ready to put information into the beta software now. Where do I start?

I realized that she runs lots of beta programs, so I should give her some context:

Hi! I’m ready to put information into PROGRAM for the NEW_TOOL technical lineage beta test. Where do I start?

This may seem trivial, but it can help people decide whether or not to handle your task immediately or wait until later. If I know what that report is, I know if it’s something I need to study, or can just glance at. If I’m doing a pass through my e-mail, and I can glance at the report to get what I need, I’m going to do that right away. Otherwise, if I know I need to study it, or I’m not sure, I’ll save it until after I’m done checking my email for the most important items.

In the most egregious case, I once saw a message where someone said

Look at the first three errors, you’ll find what you need.

This was from someone asking for help, and it’s quite likely they read the error messages and didn’t quite understand what was going on. And in fact, they asked for clarification. What they were looking for was something along the lines of:

It seems like a string was provided as a numerical input.

I suspect the messenger was trying to “teach how to fish”…This could have been done by saying something like

The first three errors are where the problem is – the first one points out the file that had the issue, the second points out the function, and the third, the line number of the problem. It seems like a string was provided as a numerical input there.

For Dates, Specify the Month, Date, AND Day of the Week

When talking about dates, ALWAYS give the day of the week as well as the month and number of the day. Here are some illustrative examples:

Are you free next Thursday for a meeting?

Today is Tuesday…are you asking if I am free to meet in 2 days, or in 9 days?

Are you free June 8th at 3 pm for a meeting?

I have no idea if I’m free on a particular day for a meeting without checking my calendar. This is perfectly reasonable, however, this is better:

Are you free Thursday June 8th at 3 pm for a meeting?

There are 2 reasons this is better. Firstly, I know I have a regular meeting at 3 pm on Thursdays. So instantly I know – I cannot make a Thursday 3 pm meeting.

Secondly – June 8th is Wednesday. June 9th is Thursday. Any date typos are going to be caught very easily if both the date and day of the week exist.

I have been involved in frustrating discussions where an organizer discussed dates at length only to realize in the end that they had typo’d the date originally. It’s so easy to type 8 when you meant 9; it’s much less frequent to type Wednesday when you mean Thursday.

It’s not just for scheduling (which some would argue can be done automatically by applications). Here’s another example:

“The conference is June 13-14th, 2022. Can you make it?”

Is that conference during the week or a weekend? This is something that’s ubiquitous and one of my pet peeves. Please don’t make me look at a calendar if I don’t have to!

Time Zones

Unless you are 100% sure everyone involved is in the same time zone, specify the time zone. Avoid abbreviations if possible – don’t make people look something up if they don’t have to. “3 pm” is not as specific as “3 PM CET”, but much more descriptive is “3 pm Central European Time”.

As well, try to avoid summer time/daylight saving time abbreviations. People often get these wrong. “3 pm EST” when it’s summer in the US looks silly. It likely won’t be confusing to anyone, but it could cause confusion when you’re on the cusp of a time change. “3 pm Eastern US time” is much more accurate.

Note that even though the official time zone is called “Eastern Time”, I put in US, because not everyone knows which east coast this is referring to.

What are your favorite things to help make life a bit easier, ensuring that people love working with you, and also getting information from others faster?

What Does Frictionless Onboarding Look Like?

Earlier this week I had a nearly frictionless onboarding experience, and I want to memorialize it in something easier to find than a tweet. Here’s how my user scenario played out.

I finished writing my presentation and wanted to put it online. Usually I copy it to a folder and link it to the presentations section of my CV and call it a day.

I’ve always wanted to embed slide decks in a post or on a webpage, so that people can see the slides and get what they need. Sometimes that information is the content on the slides, and sometimes that information is “is this the presentation I want to download”? I myself have downloaded a presentation, only to find out that it wasn’t what I expected. When the slides are online, I can skim them to verify if I should download it to go deeper.

So, I decided to see if there was an easy way for me to embed a PDF on a webpage. Here’s what I did:

  1. Do a web search for “embed PDF in website”
  2. Read some articles, learn that Adobe has something called the “PDF Embed API”
  3. Land on https://developer.adobe.com/document-services/apis/pdf-embed/
  4. Noted that it’s free and there’s a demo – click “Try the demo”
  5. Play around with the demo until I see a mode I like
  6. Click “Generate Code” and see fewer than a dozen lines of code, with easy ways to copy the code or get my own credentials.11 lines of HTML/javascript code to embed a PDF in a website
  7. I click “Get Credentials” and login to a new browser tab with my existing Adobe account. If I didn’t have one, I could create a free one right away.
  8. I am presented with a choice of which service to get credentials for – something that has more functionality and costs money, or the simple embedding service that’s free. I click the free one! 2 choices of credentials to get
  9. I only have to enter in 2 pieces of information – the domain name that will call the embedding API using this credential, and a descriptive name that I’ll use in the developer console if I need to distinguish between multiple credentials.A form with 2 fields to fill out, plus a checkbox to agree to the terms of service, and a button to submit the form. Additionally I can go straight to the console to manage existing credentials.
  10. I clicked “Create credentials” and got my credentials!
  11. I went back to the demo page – it was still there because “Get Credentials” opened up in a new browser tab. I clicked “Copy code”.
  12. I created a post in my blogging software and pasted in the example code.
  13. I read the example code to see if I could figure out what to change. I guessed that I had to enter my credentials in <YOUR CLIENT ID>, and change the url and file name of the PDF.
  14. I got extremely excited that it had only been about 10 minutes since I started step 1, and clicked the “preview” button to see if the PDF would render as a nicely embedded document.
  15. I ran into an issue where I own 3 domains – sheeri.com, sheeri.net and sheeri.org, and I used a different domain for the credentials than I’d used for the URL of the PDF. I fixed it, and shortly I had success and saw my slides beautifully embedded:

I personally was up and running in 30 minutes. HOWEVER – most users will not have the issue that I did due to my own non-standard setup, and be up and running in 10 minutes.

Self serve, 10 minutes to be up and running, without reading a single word of documentation. It’s the dream!

On Twitter, Ben asked me:

How does this compare to PDF.js (https://mozilla.github.io/pdf.js/)? Instinctively I’d suggest that replacing a proprietary external dependency with an open-source self-hosted dependency is a win, but I’ve never tried using Adobe’s PDF Embed API.

Ben Howe

I loved this question, especially his explanation of why he’d start with PDF.js. My replies were:

Adobe’s solution is very low code. I pasted a dozen lines of code, changed three parts, and it just works (filename, path, my api key). PDF.js starts by explaining the concept of promises to me….I don’t need a primer on that, I just need to display the thing.

Do you think you could go from finding the web page to having it working in 10 minutes? That’s the big difference I see. As the thread touted, it just worked. I’ve spent more time on this Twitter thread than I did getting it working.

I get that open-source self-hosted is best – but – I don’t want to compile something. PDF is a proprietary format INVENTED by Adobe. I feel Adobe has more resources and incentive to support PDF displaying than Mozilla (smart devs but not a ton of $, I worked there for 5 years).

Sheeri

So there you have it – a beautiful user experience, with the confidence that it will probably work for at least another 20 years given what Adobe has invested in the PDF format.

Artistic Wandering and Product Management

So much of what we hear, see, and make is shaped by always changing, multi-layered sets of ideas and ideals. Art helps me level up so many skills; among them, the ability to follow a wandering path, make connections, and build understanding.

I got back to playing with paints on a gelatin plate and made some art I like. My definition of art is extremely vague: “something created with intention and/or meaning.” Here’s an example:

paper on print rectangle art Abstract

This piece’s intention is technique-based – I was following a lesson to use a paintbrush to paint on the plate, then apply a printing technique.

I painted, then put netting between the paint and the paper.

In this way, I intentionally made a print that was different from painting directly on the paper.

What does this have to do with product management? I find that being creative and letting my mind wander helps me understand my thoughts, opens up my mind, and creates new neural pathways. Increasing my ability to understand thoughts and patterns is directly related to leveling up my Product Management skills.

It also feels good, scientifically.

Another PM superpower is listening. One way to develop listening skills is to listen to myself. What does my own inner monologue reveal about the connections I make? What do I do when something isn’t going as planned? Am I deeply invested in the direction I want the art to go in – why or why not? Am I satisfied if my art turns out different than I intended?

My starting limits were the size of the gelatin plate – 5″x7″ (approx 13 x 18cm) – and the paints I used. I was using a new set of paints that had 6 colors: yellow, red, magenta, blue, green and white. I decided to only use newsprint to print on – I have other materials but it had been months since I painted, so I wanted to keep it simple.

Here’s what I created, and my inner monologue. See how my thoughts translated into the artwork:

blue, red, yellow and green all interplay in this abstract art print

It’s Pride month, I’ll start with bi pride colors, blue on top.

White for the middle of the plate, magenta on the bottom, then I’ll combine them for purple in the middle.

Before I combine, this blue/white/magenta reminds me of the French flag. France . . . cheese! I’ll paint yellow triangles.

I used most of the colors but not green…the moon is made of green cheese…so some circles in green. And swirls.

OK, now do something with it that gives it a reason to be a print. Let’s run this car with textured tires over the plate. And now let’s apply this bubble wrap. OK, now to put the paper on and print it.

My intention – a bi pride flag – completely changed after I painted a little, and my brain made the connection to the French flag. The end result is something I happen to like to look at. Do I look at it and think of a French cheese, like Camembert? No.

This is practice – going through the process of creating one piece of art. It’s practice following the threads to the initial idea, to the final outcome. It builds up discovery skills. It helps see patterns and make connections – helpful when listening to understand customers, stakeholders and employees. It’s also helpful when creating or updating strategies, tactics and other plans.

So much of what we hear, see, and make is shaped by always changing, multi-layered sets of ideas and ideals. Art helps me level up so many skills; among them, the ability to follow a wandering path, make connections, and build understanding.

Getting to No: Tactics and Scripts to Stay Close to Your Path

Whether you’re trying to get alignment on goals, strategy, tactics; or stick with them during quarterly planning, daily standups, and when urgent items come up – you need to be able to recognize which ideas will lead you astray, and be able to resist them, while growing and maintaining good relationships.

This talk, first presented at Building Products That Matter, Together addresses the tough work of being able to execute your plans effectively. Attendees will learn how to figure out what to accept, delay and reject, as well as learn several scripts of how to reject and delay ideas without destroying trust and goodwill with stakeholders and customers.

Here are the slides – you can see them here and use the menu at the top right of the box (3 dots, “hamburger menu”) to download the slides if you prefer:

The video is available on ProductBoard’s website – it requires a login, but it’s completely free.

LDAP with auth_pam and Python to authenticate against MySQL

If that title looks familiar, it is because a few months ago I posted about LDAP with auth_pam and PHP to authenticate against MySQL.

The good news is that recompiling the connector for Python is a lot easier than for PHP. With PHP, the complexity was due to there being one monolithic package to recompile. The bad news is that there is a slight hitch with Python.

Skip down to the hitch and how to compile MySQLdb for use with auth_pam plugin.

As a quick reminder, here is a repeat of the background:

Background


There are two plugins that can be used. From the documentation, the two plugins are:

  • Full PAM plugin called auth_pam. This plugin uses dialog.so. It fully supports the PAM protocol with arbitrary communication between client and server.
  • Oracle-compatible PAM called auth_pam_compat. This plugin uses mysql_clear_password which is a part of Oracle MySQL client. It also has some limitations, such as, it supports only one password input. You must use -p option in order to pass the password to auth_pam_compat.

Percona’s MySQL client supports both plugins natively. That is, you can use auth_pam or auth_pam_compat and use the “mysql” tool (or “mysqldump”, or mysql_upgrade, etc.) and you are good to go. Given the choice, we would all use auth_pam, under which clients DO NOT use mysql_clear_password.

Not all clients support auth_pam, which is the main problem. Workarounds have called for using auth_pam_compat over SSL, which is a perfectly reasonable way to handle the risk of cleartext passwords – encrypt the connection.

However, what if you want to use auth_pam, and avoid cleartext passwords all together?

If you try to connect to MySQL using Python, you will receive this error: “Client does not support authentication protocol requested by server; consider upgrading MySQL client.”

Back in 2013, Percona posted about how to install and configure auth_pam and auth_pam_compat. The article explains how to recompile clients to get them to work:

The good news is that if the client uses libmysqlclient library to connect via MySQL, you can recompile the client’s source code to use the libmysqlclient library of Percona Server to make it compatible. This involves installing Percona Server development library, compiler tools, and development libraries followed by compiling and installing the client’s source code.

The Hitch


The hitch with Python is that there are a few different ways to connect to MySQL with Python. In fact, MySQL has written a Python connector, called mysql-connector-python. According to the documentation for mysql-connector-python:

It is written in pure Python and does not have any dependencies except for the Python Standard Library.

This means we cannot recompile mysql-connector-python to use the libmysqlclient library from Percona that supports auth_pam – because it does not use the libmysqlclient library.

However, mysql-connector-python is not the only way to connect MySQL to python. There is a mysqlclient-python package, which provides the MySQLdb module for connecting to MySQL. According to the documentation:

MySQLdb is a thin Python wrapper around _mysql

And the docs for the _mysql module say:

_mysql provides an interface which mostly implements the MySQL C API.

It is using the standard library, and we can recompile it. Here is a mirror of the Perl recompilation process for MySQL.

Recompiling MySQLdb to support auth_pam

Step 1

“Install Percona yum repository and Percona Server development library.” This is not a problem, do what you need to do to install Percona-Server-devel for your version.

Step 2

Install a package manager so you can build a package – optional, but useful, if you ever want to have this new client without having to recompile. As in the example, I chose the RPM package manager, so I installed rpm-build.

Step 3

“Download and install the source RPM for the client package.”

I did a web search for “mysqlclient-python source rpm” and found the rpmfind page containing many versions. If you click on the link in the “Package” column you will get to a page that has a link for the Source RPM. I chose the most recent (as of this writing) CentOS package.

So I downloaded the source RPM and installed it into the sources directory:

cd SRPMS
wget http://vault.centos.org/7.4.1708/os/Source/SPackages/MySQL-python-1.2.5-1.el7.src.rpm
cd ../SOURCES
rpm -Uvh MySQL-python-1.2.5-1.el7.src.rpm

This unpacks MySQL-python-1.2.5.zip and a patch file in the SOURCES directory and puts a spec file in the SPECS directory, so this is not as complicated as the PHP version.

Step 4

“Install compilers and dependencies”.
On my host I had no work to get any requirements installed (your mileage may vary – I had installed a lot of dependencies previously in my PHP test, and used the same machine). Make sure to include the Percona Server package for the /usr/lib64/mysql/plugin/dialog.so file:
yum install Percona-Server-server-55-5.5.55-rel38.8.el6.x86_64

Step 5

“Build the RPM file”.


rpmbuild -bb rpmbuild/SPECS/MySQL-python.spec

Then I installed my new package and tested it, and it worked!
# rpm -e MySQL-python
# rpm -Uvh /root/rpmbuild/RPMS/x86_64/MySQL-python-1.2.5-1.el6.x86_64.rpm
Preparing… ########################################### [100%]
1:MySQL-python ########################################### [100%]

I will not be continuing this experiment with any other clients (e.g. not going to try for Ruby) but I welcome others to do the same!

Lesson 09: Managing Users and Privileges in MySQL

Notes/errata/updates for Chapter 9:
See the official book errata at http://tahaghoghi.com/LearningMySQL/errata.php – Chapter 9 includes pages 297 – 350.

In the fourth paragraph of this chapter, starting with “Most applications don’t need superuser privileges for day-to-day activities” they give you some reasons why you want to create users without the SUPER privilege. There are better reasons than the book gives, which are at the MySQL Manual page for the SUPER privilege.

In the section “Creating and Using New Users” (p. 300) they say “There’s no limit on password length, but we recommend using eight or fewer characters because this avoids problems with system libraries on some platforms.” You should ignore this, this book was written in 2006 and modern system libraries can handle more than 8 characters in a password. Also ignore it when they say the same thing in the section “Understanding and Changing Passwords” (p. 324).

In the section “Creating a New Remote User” at the very end (p. 214), it talks about using % as a host wildcard character. I want to point out that if there are no ACL’s set for a given host, MySQL will reject ALL connections from that host – even “telnet host 3306” will fail. So if you avoid using %, you are slightly more secure.

In the “Anonymous Users” section (p. 315), one fact that is not mentioned is that for all users, including the anonymous user, any database named “test” or that starts with “test_” can be accessed and manipulated. So an anonymous user can create tables in the “test” database (or even “test_application”) and fill it full of data, causing a denial of service when the disk eventually fills up. This fact is mentioned later in the chapter in the “Default User Configuration” section under “Linux and Mac OS X”, but it should be known earlier.

The “mysqlaccess” utility described in the section of that name (p. 320) is usually not used. These days, folks prefer the pt-show-grants tool. Here is a blog post with some examples of pt-show-grants.

In the section on “Removing Users” (p. 324), it says that if all the privileges are revoked, and a user only has GRANT USAGE, “This means the user can still connect, but has no privileges when she does.” This is untrue, as mentioned before, everyone can access and manipulate databases starting with “test”.

The section “Managing Privileges with SQL” is deprecated (p. 339-346, up to and including “Activating Privileges”). It used to be, back when this was written, that few people used the GRANT statements and more people directly manipulated the tables. These days, it’s the other way around, and due to problems like SQL injection, there are safeguards in place – for example, if you change the host of a user with an ALTER TABLE on the mysql.user table, the user will have all privileges dropped. Just about the only thing direct querying is used for, is to find who has the Super_priv variable set to ‘Y’ in the user table.

Supplemental material: I have a video presentation on security which includes ACLs and there are accompanying PDF slides.

Topics covered:
Creating and dropping local and remote users
Different MySQL privileges
SUPER privilege
GRANT and REVOKE syntax
Hosts and wildcards
Anonymous and default users
Checking privileges
Password management
Basic user security
Resource limit controls

Reference/Quick Links for MySQL Marinate

MySQL 5.7 InnoDB Temporary Tablespace – but why?

So, recently we had a runaway query eat up all sorts of temporary table space on our machines. Several machines had several terabytes in their ibtmp1 file after this happened. So I set out to find out more about why the InnoDB temporary tablespace is used, why it is better than using regular files, which was what was used prior to MySQL 5.7, and how to make sure that runaway queries do not end up filling up disk space.

Unfortunately, the manual does not go into why ibtmp1 is better than one file per temporary query, which disappears once the query ends. There are a few sections to look into:

Temporary Table Undo Logs – has one paragraph that states that these are the undo logs for temporary tablespaces. Given that these are undo logs, my guess is that this makes MySQL more crash-safe. But that is just a guess.

There is also InnoDB Temporary Tablespace which is two paragraphs, with some more details, but again, no reasoning why.

And finally, the documentation for the innodb_temp_data_file_path system variable sheds a bit of light on the subject – It explains “Metadata about active InnoDB temporary tables is located in INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO.”

There is a manual page on Innodb temp table info table as well, which shows both compressed and uncompressed tables – uncompressed tables live in the ibtmp tablespaces, and compressed temporary tables live in the .ibd tablespace of the compressed table in question – as shown in the “PER_TABLE_SPACE” and “IS_COMPRESSED” fields.

Sadly, the table does not give useful information such as which process ID or user is generating the data. And of course it is only the active temporary space usage at the time – if you have a large temporary tablespace but no active queries using the tablespace, INNODB_TEMP_TABLE_INFO is empty.

I can imagine a scenario with more than one long-running query using a lot of space in the temporary tablespace. But I do not see how the INNODB_TEMP_TABLE_INFO would help me determine any useful information as to which query I should kill. I guess it is useful to see if there is an active query currently using temporary tablespace, but when you have a large file with nothing in it, it is just that much more depressing.

# mysql INFORMATION_SCHEMA -p -e “SELECT * FROM INNODB_TEMP_TABLE_INFO”
Enter password:
# ls -rlth /var/lib/mysql/ibtmp1
-rw-r—– 1 mysql mysql 2.3T Oct 31 10:50 /var/lib/mysql/ibtmp1

Lesson 08: Doing More with MySQL

Notes/errata/updates for Chapter 8:
See the official book errata at http://tahaghoghi.com/LearningMySQL/errata.php – Chapter 8 includes pages 277 – 296.

Note: if trying to load the “academics.csv” file on Linux, you may want to set the owner and group to mysql first:
sudo chown mysql:mysql /tmp/academics.csv

In the section on “Creating Tables with Queries”, p. 286, it says “The LIKE syntax allows you to create a new table with exactly the same structure as another, including keys.” However, it only preserves UNIQUE, PRIMARY and regular indexes. It does not preserve FOREIGN keys.

In the section on “Replacing Data”, p 292 – 293, it mentions the REPLACE command, but not the INSERT…ON DUPLICATE KEY UPDATE syntax. You can read about that at the MySQL manual page.

Finally, for more information on EXPLAIN, check out this video presentation on how to use and interpret EXPLAIN. There are also PDF slides.

Topics covered:
Importing and Exporting Data
Replacing Data
CREATE TABLE…LIKE
EXPLAIN

Reference/Quick Links for MySQL Marinate

Lesson 07: Advanced MySQL Querying

Notes/errata/updates for Chapter 7:
See the official book errata at http://tahaghoghi.com/LearningMySQL/errata.php – Chapter 7 includes pages 223 – 275.

Supplemental blog post – ORDER BY NULL – read the blog post and the comments!

GROUP BY and HAVING examples – Supplemental blog post. The example of HAVING in the text shows a use case where HAVING is the same function as WHERE. This blog posts shows examples of HAVING that you cannot do any other way.

In the section called “The GROUP BY clause”, on pages 231-232, the book says:
“you can count any column in a group, and you’ll get the same answer, so COUNT(artist_name) is the same as COUNT(*) or COUNT(artist_id).” This is not 100% true; COUNT does not count NULL values, so if you had 10 rows and 1 artist_name was NULL, COUNT(artist_name) would return 9 instead of 10. COUNT(*) counts the number of rows and would always return 10, so COUNT(*) is preferable when you intend to count the number of rows.

Also in that section, on page 233 when they show you the example:
SELECT * FROM track GROUP BY artist_id;
– Note that they explain the result is not meaningful. In most other database systems, this query would not be allowed.

In the “Advanced Joins” section, specifically on page 238 at the bottom where they say “There’s no real advantage or disadvantage in using an ON or a WHERE clause; it’s just a matter of taste.” While that’s true for the MySQL parser, it’s much easier for humans to read, and see if you missed a join condition, if you put the join conditions in an ON clause.

In the section on Nested Queries, on page 251, it says “nested queries are hard to optimize, and so they’re almost always slower to run than the unnested alternative.” MySQL has gotten better and better at optimizing nested queries, so this statement isn’t necessarily true any more.

A “derived table”, is a nested query in the FROM Clause, as described in the section heading with that name (p. 262).

In the “Table Types” subsection (p. 267), it says that MyISAM is a good choice for storage engines, and that “you very rarely need to make any other choice in small-to medium-size applications”. However, it’s recommended to use InnoDB for better concurrency, transaction support and being safer from data corruption in a crash situation. Indeed, the default storage engine in more recent versions of MySQL is InnoDB.

In addition, the lingo has been changed since the book was written; we now use “storage engine” instead of “table type”. The examples that use CREATE TABLE or ALTER TABLE with TYPE may need to be changed to STORAGE ENGINE instead of TYPE.

Finally, you can skip the section on BDB since it has been deprecated (p. 274-5).

Topics covered:
Aliases
Join style
Joins (JOIN, INNER, COMMA, STRAIGHT, RIGHT, LEFT, NATURAL)
UNION and UNION ALL
Data aggregation (DISTINCT, GROUP BY, HAVING
Subqueries and Nested Queries (including ANY, SOME, ALL, IN, NOT IN, EXISTS, NOT EXISTS, correlated subqueries, derived tables, row subqueries)
User Variables
Transactions/locking
Table Types/Storage engines

Reference/Quick Links for MySQL Marinate