Matrix Multiplication


Here's an example of matrix multiplication that I find helpful to think about.

Say we have four people (Alice, Bryan, Carol, and Dawn), each of whom has a shopping list (a certain number of each of three items — Apples, Bread, and Corn — that they'd like to buy). We can represent all of these shopping lists as rows of a matrix, one per person, as shown on the top right.

Say each person is considering two different grocery stores, Cub and Shaw's, each of which has a different price for each different food item. We can represent the price lists as columns of another matrix, shown on the bottom right.

Remember that we always write the size of a matrix as “rows-by-columns”. So the shopping-list matrix is 4×3, and the store-price matrix is 3×2.

We want to find out the total cost of each person's shopping list at each store.

To find out the cost of Carol's shopping list at Shaw's, for example, we need to multiply the number of apples Carol wants by the price of apples at Shaw's ($2 \cdot 5$), plus the number of loaves of bread times the price for bread ($1 \cdot 5$), plus the number of ears of corn times the price for corn ($3 \cdot 3$). The total cost, 24, is what Carol would pay for her whole shopping list at Shaw's. (If you're familiar with vector arithmetic, you might recognize this as the dot product of Carol's shopping list with the price list at Shaw's.)

What matrix multiplication does is compute all of these sums-of-products in one go. When we arrange the shopping lists as the rows of the first matrix, and the price lists as the columns of the second, we can just multiply the matrices together. The entry at row $i$ and column $j$ in the result is the sum-of-products of the row $i$ from the first matrix (one particular person's shopping list) with column $j$ from the second matrix (one particular store's price list).

This explains why the “inner sizes” in a matrix multiplication — the number of columns in the left matrix and the number of rows in the right one — must match: because the column labels on the left are for food items, as are the row labels on the right. Another way to look at it is that every shopping list needs a quantity for each food item, and each store's price list needs a price for each food item.

It also explains why the “outer sizes” carry over to the result of multiplication: we have one shopping list per person, and one set of prices per store, so in the end we're left with a set of values that correspond to each (person, store) pair. We've “summarized over” that inner Food dimension.

To write out the definition of matrix multiplication in formal mathematical notation, say we have two matrices $A$ and $B$. $A$ is $n \times m$, and $B$ is $m \times p$. If we multiply them together — $AB = M$ — we get an $n \times p$ matrix $M$ whose entry in row $i$, column $j$ is:

$$ M_{i,j} = \sum_{k=1}^{m}A_{i,k}B_{k,j}. $$

All this says is that we move across row $i$ in matrix $A$, and down column $j$ in matrix $B$, multiplying corresponding values together as we go, and adding all of these products up. Study the formula and try to see the connection.

One last thing: remember that there is nothing in any way special about the choices of the variable names $i$, $j$, and $k$ in the formula above. $k$ is just the “loop variable” for the summation, and $i$ and $j$ name the position in matrix $M$ that we're trying to compute a value for.