Saturday, December 31, 2016

MIPS: Millions of Instructions Per Second (Part 1)

Programming in an assembly language is an arduous task, but it is definitely rewarding.  Knowing what is happening at the hardware level of a computer gives one a sense of the intricacy of history's most complicated machine.  It will also make you appreciate higher level languages and never want to live without them ever again.

Luckily you are reading this blog because you are being taught how to program in MIPS, one of the easier assembly languages to pick up.  MIPS is a RISC architecture, which stands for reduced instruction set computer.  Everything that you program in MIPS is some combination of very simple commands, such as "add these two numbers together" or "jump to this instruction."  More complex instruction sets exist - such as x86 - that include specialized extensions.  These extensions improve the speed and efficiency of a program, but require more knowledge about their applications.

 

How an Assembly Language Works

 

An assembly language is a specific type of programming language with minimal abstraction between the hardware and code.  It is defined by its instruction set - a list of commands to the computer and how they function.  Each instruction assembles to some number of machine instructions, which exist in a binary format that the computer can actually use to execute a program.  Typically there is a one-to-one mapping between an instruction and an associated machine instruction, but there exist pseudo-instructions which assemble to multiple machine instructions that must be executed consecutively.

Code written in an assembly language has to be transformed into machine code in order to be executed.  We call this transformation assembly, hence the name given to this category of programming languages.  Much, much more happens past this level, but this post will focus exclusively on the programming aspect of MIPS.  Typically an instruction set will have an assembler that is maintained by the company that manufactures the chips used by that instruction set.  However, instead of using this assembler we will be using an integrated development environment called MARS, which will simulate a MIPS processor.

 

MARS IDE

 

MARS is developed and maintained by Missouri State University and can be downloaded here.  It requires Java to run, so make sure you have that installed.  When you first open MARS you will be treated to a screen that looks something like this:


A quick rundown of what you're seeing: The area at the right shows the current contents of each register, and is very useful for debugging.  Registers are basically caches of very high-speed memory that are found directly within a MIPS processor (or any other type of processor).  All instructions that do any sort of calculation must do so on registers.  MIPS has 32 registers accessible to the programmer - each one functions exactly the same in hardware terms, but from a software standpoint have a specific purpose and should be used according to a standardized convention.  That's not important quite yet, but we'll get to it.

The area at the bottom is the console and functions basically the same way as in Eclipse or Netbeans.  There are two tabs - the first will tell you any problems the assembler has with your code, and the second is the standard output of your program.  Finally, the area in the middle is of course where you'll be writing your code.  Everything else is pretty self-explanatory - feel free to explore a bit more before we begin.

 

Your First Program

 

Create a new file by going to File > New.  Save your file and name it anything you want.

Getting right into it, there are two main areas of your program: the .text section, and the .data section.  The .text section is where your code will go, while the .data section is where you can declare variables stored in memory.  We'll just stick to the .text section for now, but go ahead and create both sections by writing .text at the top of your file, and .data some number of lines beneath, like this:


The MARS editor will highlight and recognize these as directives, special commands that the assembler will deal with when building your code.  In this case, the .text and .data directives indicate what sort of content the assembler should expect next.

Below the .text directive, write the following lines:


Both of these lines are instructions.  The first line is a load immediate.  All this does is set the value in the specified register, in this case $v0, to some number.  An immediate is some constant number written in code, as opposed to a variable stored in a register.  Here our immediate is 10.

The second line is exactly what it says it is: a syscall.  You can think of a syscall as a special function that performs some sort of action that is important to the system using the MIPS processor.  One example of a syscall is printing a number to the console, or in our case exiting the program.

You can go ahead and build the program by clicking the icon on the toolbar at the top.  This will automatically switch you to the execute tab and away from the editor.  The middle of MARS will look like this now:


Just like your program, the execute tab is split into a text segment and a data segment.  The text segment shows the raw instructions that your program assembled to, while the data segment gives you a view of every location in main memory.  In the text segment you can set breakpoints, which will pause your program while it is running so that you can see what state the machine is in.  This will be extremely useful when (not if) you run into a bug that you can't catch just by looking at the code.  Be prepared to use this feature.  A lot.

There's also the program arguments field.  If you've programmed in C or C++ before, this is the same thing as the command line.  If you haven't, don't worry - we'll cover what this does later.

Now go ahead and run your program by clicking the button on the toolbar.  Not much will happen, since all we've done is told the program to exit as soon as it starts.  You'll see the following printed out in the console:


Were we not to exit the program using the syscall, you would see this instead:


The reason there's this distinction between just letting the program reach the end and manually exiting is that when you start writing functions at the bottom of your file, MARS will run straight through them, probably causing errors.  Using the exit syscall prevents this from happening.

 

Hello World

 

Alright, now we're past all the boring stuff.  Let's write some actual code.  In your .data section, allocate a string variable hello_world:


Every variable in your data section has a label.  Our string has the label hello_world, and can be used in the text section by referring to that label.  Any time a variable is declared in the data section, it has to have a type directive.  An .asciiz is a null-terminated ASCII string.  A null-terminated string is any string that is stored with a null terminator as its last character.  A null terminator is just the value zero, also called null, and is used as a way of denoting where the string stops in memory.  ASCII is one way of encoding letters and symbols using binary numbers.  It was designed specifically for the English language, and is only capable of representing up to 256 different symbols.  Other encodings exist that can represent pretty much any symbol used to communicate, but they are vastly more complicated.

Like all other hello world programs, our goal is to print our string to the console.  This is rather simple:


This is yet another syscall.  The first thing this does is load address our string into the register $a0.  Our string exists in memory at a certain location - the address of our string is that location.  We often refer to addresses as pointers, because they point to something in memory.  Loading the immediate value 4 into $v0 specifies the type of syscall we want.  Syscall 4 prints a string to the console, located at the address in $a0.

Running this program, you will get the following output:


As expected, our string appears in the console.

 

Loops

 

Now let's do the same thing, but some variable number of times.


We start off by loading two values in two separate registers.  The meaning of the value stored in each register is noted by the comments, which begin with the pound character (#).  Line 6 is just the label loop.  Labels in the text segment can be referred to by instructions - this enables us to jump to some point in the code from some other point in the code, without having to specify a line number.

The next line is a branch on equal instruction.  This instruction compares two registers and jumps to the given label if they have the same value.  There are several different branching instructions that check for different logical relationships between two values.

Line 11 is an add immediate instruction.  This instruction takes three arguments: two registers and an immediate value.  The first register is the destination register - where the result is stored.  The second register is a variable, and the immediate is how much we're adding to that register.  In this case and in most cases, we are incrementing a register with this instruction - the destination and source register arguments are therefore the same, $t0.

Line 12 is a branch instruction.  A plain branch instruction takes no register arguments and always jumps to the label it's given.  It functions in exactly the same way as a jump instruction (j).  The difference between the two is unimportant for a programming guide.

This program is equivalent to the Java program below:

 

Factorial Program

 

Let's use loops to perform some sort of calculation.


Some new things: Line 8 has been changed to a bgt instruction, which means branch greater than.  This should be pretty self-explanatory.  Lines 9 and 10 work together to multiply $t2 by $t0 and load the result back into $t2.  The mult instruction is a bit different in how it works compared to addi.  There are two auxiliary registers, $lo and $hi, that together store a contiguous 64-bit result.  Each register in MIPS is 32 bits, but multiplication can result in a larger number pretty quickly, so twice as much space is set aside.  We're assuming that our number is small enough to fit in 32 bits, so we just load the lower half with the mflo instruction, which just copies from $lo into some other register.  Line 16 uses the move instruction, which just copies the value in the second register argument to the first register argument (from $t2 to $a0 in our case).

Compare to the Java equivalent:

 

Loop Style

 

Now that we've seen a couple example programs with loops, it might be a good time to talk about the way loops are normally formatted.  In MIPS assembly, everything is a while loop.  If you need to have a counter, you must initialize it before the loop begins.  At the very top of the loop, you have some label which indicates the beginning of the loop.  Then you have some condition (or set of conditions) that, when it becomes true, the loop is broken.  At the bottom of the loop you have a branch or jump instruction that goes to the top of the loop.  If you are emulating a for loop, then you increment your counter right before you branch to the top.  In other words:


This sort of pattern will show up frequently in your programs.  One extra thing to note - branch instructions accept either registers or immediates as their second argument.  So while in the previous programs I loaded a value into a register to compare to, I could just as easily have used an immediate value instead, as I have here.  This is something unique to branch instructions.

 

Non-immediate Instructions

 

We have been using addi to do all of our addition so far, but there are times when you might need to add two variables together.  Immediates cannot be changed at runtime, so the second variable has to be read from another register.  The add instruction does this, and is the instruction that addi is based on.  You can think of immediate instructions as slight alterations of non-immediate instructions.  The non-immediate instructions were designed first, and their immediate versions added as an additional consideration.



The add instruction stores the result of $t1 + $t2 in register $t0, the first argument.  The sub instruction does the same exact thing, except it calculates subtraction (and in this case is storing $t4 - $t5 in $t3).  Other instructions like this exist, which I will address in later installments.

 

To Be Continued

 

In the next part, we will cover if statements and memory operations.

2 comments:

  1. Nice intro article on MIPS. One caveat I have is the explanation of system call. Now it might be too complicated to really explain in a introduction to MIPS but the syscall instruction itself is defined by the MIPS ISA. The operations are typically set up by either bios or the os initialization. The code that makes up a system call implementation is assembly as well. The reason for having system calls typically boils down to permission and trust. A typical system call will allow the safe transfer from user mode to supervisor mode(not the same thing as sudo in Linux). This is a hardware abstraction used by most processors. In x86 (and amd64), there are 4 rings (ring 3=user, ring 0=supervisor). In most MIPS implementations I believe there is two modes of operation. The basic idea is that we don't want to trust any program to have complete control of everything(as in hardware and IO stuff), so we restrict certain instructions, maybe even some registers to only execute at a certain privilege level. So in practice, your os runs in supervisor mode, and after initialization it will change modes to userspace and launch your terminal or windowing environment.

    The first dozen or so system calls we see in mars are inspired by another simulator called spim. The rest were made up by the mars team or someone else. These actions are not tied to the instruction set itself though. They were probably designed to make it easier to use spim to teach students MIPS assembly.

    ReplyDelete
    Replies
    1. Interesting! I will change my bad generalization and just state that it is a special function.

      Also, just curious - did you come across this blog in the facebook group or by other means?

      Delete