Boot.Sys: MIPS: Millions of Instructions Per Second (Part 3)

In the last post, I covered branching logic and memory operations. In this post I will address the use of functions and proper register convention.

Functions

Java gives the programmer the luxury of methods, which mainly avoid the problem of copying the same code all over your program, among other things. Procedural languages, such as C, have no classes and instead use functions (also known as procedures). The difference between a method and a function is usually deferred to a simple question: Is it in class? If so, it's a method. Otherwise, it's a function.

C compiles down to assembly code, and MIPS is one of the platforms that can be targeted. So MIPS is able to provide the same functionality, just in a different form. In MIPS, a function is an area in your code beginning with a label and ending with a jump back to the location where the function was called. A function may accept arguments, which act as inputs, and may return data as output.

Let's write a function that calculates the factorial of a number:

We're calculating the factorial in a slightly different way here, more equivalent to the following Java code:

On lines 13 and 14 I have commented the inputs and outputs of the function and which registers they use. MIPS has four a registers which are meant to store the first four arguments to a function. Up to this point we have used them as inputs to syscalls, which are, in effect, functions. MIPS also has two v registers which are meant to store the return values of a function. That's right - there's two. You can return two values at the same time, one in $v0 and one in $v1. Of course, there are ways of returning more things and accepting more inputs - I'll address that in a minute.

A minor point, but you will notice that line 22 uses the multu instruction instead of a regular mult. This means multiply unsigned, and it treats the two input registers as unsigned integers rather than signed 2's complement integers. Most instructions have an unsigned counterpart, and are distinguished by a u at the end of their abbreviated name.

On lines 27 and 28, I copy the factorial into $v0 to return and use the jr instruction, which means jump register. This instruction jumps to a variable address provided by any register, but usually $ra. There's a significance to this register.

From the entry point of the program at the top of the file, we call the function by using the jal instruction, which stands for jump and link. This is a pseudo-instruction which compiles to two regular instructions. First, the address of the line 6 is copied into $ra - this is why $ra is significant. Then a jump is performed to the specified label. When the function returns, the program will continue from the instruction immediately following the jal.

Register Convention

So that's pretty much all functions are. The way they work is very simple, but their very existence opens up a certain can of worms. In Java when we call a method we expect all of our local variables to be left alone - we expect the method not to screw with them. The same thing needs to happen in MIPS, otherwise our program would have bugs everywhere. There's a problem though - what if I'm using $t0 for something but the function I want to call also uses $t0? Unless I save it somewhere, the variable I put in $t0 is going to be overwritten. It will effectively disappear.

To get around this problem, programming teams must have an agreed-upon convention that will avoid data from being stepped on. We call this register convention. In using it, all programmers agree to use registers in a specific manner that is predictable, sort of like driving on public roads. By agreeing not to drive on the wrong side of the road, we avoid head-on collisions. Likewise, by agreeing to use registers in a certain way, we avoid bugs.

There are four types of registers:

Argument (a) registers [$a0 - $a3]
Temporary (t) registers [$t0 - $t9]
Return (v) registers [$v0 - $v1]
Saved (s) registers [$s0 - $s7]

We have used all but #4 on this list so far.

Argument registers are, as discussed, used to provide arguments to functions and syscalls. When they are handed to a function, that function is given ownership of them. That means that there is no guarantee that a function will return with the same value stored in $a0 as before it was called - it can overwrite all of them at will without saving them anywhere.

Temporary registers are not used to provide arguments in any circumstances, but like argument registers there is no guarantee that a function will not overwrite their contents. They are the most commonly used, hence the fact that there are 10 of them.

Return registers are used to store the output of a function, and also to tell the processor which syscall to use. Just like the previous two types, they can be overwritten by functions.

Saved registers are unlike the other three. They are used in the same way as temporary registers in that they store intermediate results of calculations, but they can never be overwritten by functions. If a function wants to use a saved register it must save its original value somewhere and load it back before returning to the caller. We say that it is the callee's responsibility to ensure continuity, that saved registers remain unchanged. For the other three types, it is the caller's responsibility to maintain continuity.

You might be wondering where to save the registers. It is most common to saved them on the stack, an area in main memory with a FIFO growth policy. If you've taken a course on data structures you should know what a stack is and how it works. In the case of MIPS, the stack grows downward from an address set at compile time, and shrinks upward - that is, the "top" of the stack begins at a higher address and moves to a lower address as the stack gets bigger.

The top of the stack is pointed to by $sp (the stack pointer). This register must only be used for this purpose, otherwise your program will explode. When a function wants to allocate space on the stack, it subtracts the number of bytes it needs from $sp and then saves its data to that region. When that function returns, the stack pointer is added back to where it began.

Enough talk, let's put all this into practice by modifying the factorial program.

The entry point of the program has two variables that it wants saved between functions calls, one in $s0 and one in $t0. Because $s0 is a saved register, it doesn't have to do anything to ensure that it remains the same after factorial returns. However, factorial might use $t0, so it has to save its contents to the stack. On line 7, it asks for 4 bytes by subtracting 4 from $sp, and then saves $t0 with an offset of 0. It calls the function on line 11, and then loads $t0 back from the stack on line 14. It subsequently resets the stack pointer by adding 4, the same amount that it previously subtracted.

Within the factorial function we've switched from using $t0 to using $s0. Even though we now know that it isn't using $t0 anymore, it is still convention to save $t0 in the entry point anyway. Think of it like this - the entry point and the factorial function are two separate parts of your program that might be modified by different people. If one person is working on factorial and you are working on the entry point, what if your teammate decides to use $t0 later on after all? You would have a bug in your program. By following convention to the book you avoid these situations.

Before factorial uses $s0, it asks for 4 bytes on the stack and saves it there. It is then free to use $s0 as it pleases, as long as it loads the original value back. It does so at line 50, so it is practicing good register convention. It does not have to do this with $t1 or $a0, despite using both.

Parent Functions

Sometimes a function will need to call another function to complete its task. I use the loose terminology parent function to refer to functions that call other functions. Functions that do not call other functions are sometimes called leaf functions. The factorial function we just wrote is an example of a leaf function.

Parent functions need to take one additional step before calling other functions. Because the return address is always stored in $ra, a call to jal will overwrite this register. A parent function must save $ra to the stack and load it back right before returning if it wants to jump back to the right place. To show what I mean, take the function two_factorials which adds the factorials of two input numbers together:

In addition to saving $s0 before using it, two_factorials saves the return register so that it can reload it later. This can be seen on lines 64 and 78. Note that because we need the space for 2 registers, we ask for 8 bytes on the stack.

To Be Continued

In the next part I will cover bitwise operations and files.

Friday, January 6, 2017

MIPS: Millions of Instructions Per Second (Part 3)

Functions

Register Convention

Parent Functions

To Be Continued

No comments:

Post a Comment