Monday, January 30, 2017

Floating Point Binary

I've discussed how to use binary to represent integer numbers on a computer.  There are times, though, where we might want to represent the real domain of numbers and not just integers.  Scientific computing in particular uses real numbers extensively.  The way to do this is to use floating point representation.

 

Floats and Doubles

 

You've probably used floats and doubles before in Java or some other language.  You also probably know about their quirks - that they can't represent all real numbers.  Pi, for instance, can only be approximated because its true value requires an infinite amount of space, which computers don't have.  You will also, on occasion, run into situations where the value is just slightly off.  For example, if you type 1.1 + 1.1 + 1.1 + 4.0 into python, you'll get 7.300000000000001.  This is an artifact resulting from the way real numbers are stored.

A float uses 32 bits to represent a real number.  It consists of 1 sign bit, 8 exponent bits, and 23 bits of mantissa or fraction.  The sign bit is self-explanatory - a zero means positive, a one means negative.  It is the leftmost bit just as in signed magnitude representation.  The exponent bits are stored in a format called excess-k.

Excess-k is yet another way to encode signed integer numbers in binary.  K is a variable, and in the case of 32 bit floats is set to 127.  To encode an integer in excess-k, add k to its decimal value and then encode it as an unsigned binary number.  For example, if we want an exponent of -10, we would add 127 to get 117 and encode that in binary (01110101).

The mantissa/fraction works a bit differently.  Instead of whole numbers, each bit represents a negative power of two.  The leftmost bit is worth 0.5 (2-1), the next 0.25 (2-2) and so on. There is an implicit 1.0 added to this number, such that if the two leftmost bits are set the mantissa has a value of 1.75 (1.0 + 0.5 + 0.25).

In total, the exponent bits and mantissa work together to represent a real number.  The exponent is applied to 2 as its base (e.g. an exponent of 10 implies 210), and then the result is multiplied by the mantissa to recover the original decimal representation, paying attention to the sign bit.

 

Conversions

 

To convert a binary number from excess-k to decimal, you can use the multiply method as normal, then subtract K:


What represents 18 in 8-bit unsigned binary represents -109 in excess-127.  To go in the other direction, add K and use the division method for unsigned binary.

For the mantissa, the multiplication method still applies:


The only difference is that now we are working with fractions, and we add 1.0 to the result.  But to go from a decimal number to the mantissa, there is a slightly different method than the one we have used for everything else.


First we subtract 1.0 from the decimal number.  That will give us the number to be encoded in binary.  After that, we multiply by 2.0 for each bit.  If after multiplying the result is greater than 1.0, we set that bit to 1 and subtract 1.0.  Otherwise, we set that bit to 0.  The result is read from top to bottom, not bottom to top.

Put these two things together and you understand how to encode/decode floating point binary.


A 64-bit floating point number (a double) works exactly the same way, except with different bits for each field.  There is 1 sign bit, 11 exponent bits (encoded in excess-1023), and 52 mantissa bits.  Use the same methods to convert to and from decimal.

Saturday, January 28, 2017

Integer Number Systems

The most common way to write a number is in the decimal system.  In this system, there are ten different discrete digits used to represent value.  It is therefore a base-10 counting system, because the number of symbols used is ten.  You could write that same value, however, using a different number of symbols as your alphabet - e.g., you could write it in base-2, base-3, and so on.  This post will discuss ways in which to represent numbers, as well as how numbers are stored on a computer.

 

Counting Systems

 

Integer numbers are mainly used to describe the size of discrete sets.  If you have two apples, the size of the set of the apples that you have is two.  If you wanted to communicate the set's size, you could just point at the collection of apples.  Once you get past a certain size, however, it becomes difficult for humans to discern one count from another.  Imagine trying to determine if there were 100 items in a set instead of 101 just by looking at the set - you would have to systematically look at each item individually and mark the ones you've already counted, otherwise you would lose track pretty quickly.

Your set, just by existing, exhibits a base-1 counting system.  If your set consists of apples, you could think of an apple as a symbol representing itself.  Line up the apples next to each other, and you have a base-1 number.  If you wanted to write down the size of your set in base-1, though, it would be time consuming to draw an apple for every apple that you have.  You could instead represent each apple with a 0, and write a line of zeros.  But this, too, is more time consuming than it needs to be.

You could dramatically reduce the number of symbols you have to write by moving up to a base-2 counting system.  Instead of a zero for every apple, you represent the count using a string of ones and zeros.  To reduce it even further, move up to base-3.  Further, base-4.  Etc.  We use base-10 the most, supposedly because we have ten fingers.

There is a diminishing return to increasing the base in regards to the amount of space required to represent a value.  To represent up to four billion values, you only need 32 digits in a base-2 number system.  You can represent the same number of values in base-2 as base-1 with millions of times less space.  You can represent the same number of values in base-3 with 14 digits, only a little less than half the space as base-2.  For this reason (and several others), classical computers - like the one you are using to read this post - use a base-2 counting system, or binary.  There is little need to increase the base, as the gains in doing so are vanishingly small.

Computer scientists often pick from a set of the same four counting systems to represent numbers: decimal, binary, hexadecimal, and octal.  Decimal is used because it is what most people read.  Binary, of course, because the computer uses it.  Hexadecimal (base-16) to write fewer digits, and octal (base-8) for the same reason.  You'll notice that hexadecimal and octal have bases that are powers of two - there's a reason for this, which I will explain.

 

Conversions

 

It is a common task to have to convert between one number system and another.  The method (algorithm!) that applies the most is the multiply and divide method.  Here, you first convert the number to decimal by multiplying by powers of the base, then continuously dividing by the new base and taking the remainder each time to form the new representation.

The process of multiplication into decimal can be seen below:


From right to left, multiply each digit by an increasing power of two, and then add the result.  Normally we write the base as a subscript to the number when not using decimal to avoid confusion when it is not explicitly stated what number system is being used.

Now that we have the decimal value, we can fully convert using the division algorithm:


Keep dividing by the base (in this case, two again) and writing down the remainder.  Once the quotient is zero, you're done.  The final value is read from bottom to top.

This method is more time-consuming than it needs to be when converting between bases of powers of two.  If I am converting from binary to hexadecimal or octal, I can group digits together in sets that form new digits in the new system:


When converting from binary to hexadecimal, we group four binary digits together.  We then translate it via a "lookup table" (you could probably memorize this table with effort).  The same thing applies to octal, except we group three digits instead of four:


This is even easier to accomplish because there are fewer entries in the table.


Computers and Binary


So far we've dealt only with positive numbers.  But negative numbers need to be represented by a computer, too.  In only representing positive numbers, we use an unsigned binary system.  To represent negative numbers in addition to positive numbers, we have several choices.  We could treat one of the bits as the sign of the number - this is referred to as signed magnitude.  All but one of the bits form an unsigned binary number, while the leftmost bit encodes the sign (zero for positive, one for negative).


When taking a negative decimal number and converting it to a signed magnitude binary number, use the same method as with unsigned binary to start, then add a one to the left end of the number.  Since computers work with a fixed amount of space (in the case of MIPS, 32 bits for every register), we must sacrifice one bit to encode the sign (so that 31 bits remain for the magnitude).  If the number is not negative, then the leftmost bit is set to zero while the remaining bits represent the unsigned magnitude of the number.

There is one flaw with signed magnitude - there are two representations of zero.  There is a positive zero and a negative zero (+0, -0).  This is effectively meaningless, and there is really no reason we would want to have both a positive and negative zero.  As a response, computer scientists came up with two's complement representation.

But before we talk about that, we need to discuss one's complement, which is directly related.  Like signed magnitude, one's complement has two zeros.  Unlike signed magnitude, we don't simply treat the left bit as a sign bit.  Positive numbers are represented in the same way as in signed magnitude - the leftmost bit is zero, while the remaining bits encode the number in unsigned binary.  But for negative numbers, we take the positive equivalent and flip all the bits:


Here I have a fixed-size 8-bit register in which I represent my number.  To go from decimal to one's complement, we again compute the unsigned representation like before.  Since I'm using a fixed-size representation, I "append" additional zeros to the left of the number.  Because the number is negative, I then flip all the bits.  So 1100 gets turned into 00001100, which gets converted to 11110011.

While we do this extra work with the lower bits to represent negative numbers, it is still trivial to tell whether a one's complement number is negative or not: look at the leftmost bit.  The leftmost bit tells you the same information in one's complement as it does in signed magnitude, even though it does not act alone in encoding negative numbers.

As mentioned, there are two representations of zero: with 8 bits, 00000000 and 11111111.  Two's complement finally addresses this by mapping negative numbers slightly differently - to get the negative representation of a number, add 1 to the one's complement representation.


Here we take the unsigned representation of 40 (101000), add zeros to get a fixed-size number (00101000), flip all the bits (11010111), then add 1 (11011000).  Like one's complement and signed magnitude, the leftmost bit can be relied upon to determine whether the number is negative or not.

Nearly every computer uses two's complement natively to represent signed integers.  This is certainly the case for MIPS processors.  Unsigned binary can be used to represent unsigned integers, just as long as special instructions are made available that treat their input as unsigned.

Monday, January 23, 2017

MIPS: Millions of Instructions Per Second (Part 5)

It's the last part!  There might be additional posts on tangential topics, but this is the last of the must-know tutorials.  In any case, if you've made it here you're probably pretty confident about programming in MIPS.

Previously I discussed bitwise operations and files.  In this chapter I will talk about recursion and the frame pointer.  You should be familiar with recursion in high level languages before continuing.  If you know what it is but still get tripped up here and there, then some of this post will act as a refresher.

 

Recursion

 

Recursion is just the act of defining a procedure in terms of that same procedure.  In both MIPS and Java, it is achieved by calling some function within its definition.  A recursive function will have two parts - some base case that returns without making a self-referential call, and an "else" section that does.  Without a base case that's properly defined, the recursive function will enter an infinite loop of calls.  This makes it different from regular functions in practice, because it is more likely that a programmer will have to visit the debugger when writing a recursive function than not.  For this reason I highly advise that you read Debugging in MARS first.

The automatic example used for recursion is the Fibonacci sequence.  Kind of cliché, I know, but it's what works.  Here is an implementation in Java:


In this case we're using the classic interpretation of the sequence, which begins with 1 instead of 0.  We have two base cases defined - one if n is 0, and another if n is 1.  They both return the same thing, so we capture their behavior in a single if bracket.  For every other case, we return the sum of the last two terms.  We're assuming we get no negative numbers as input - if we did, this would produce an integer underflow.  But that's not too important, since there are no negative Fibonacci terms.


The MIPS equivalent has three labels.  The beginning of the function, the base case, and the else case.  At the beginning of the function we check the value of the integer passed to the function to determine if we've hit the base case.  If so, all we have to do is return 1.  If we haven't, then we have to recursively call fibonacci twice to get the two composite terms.  It's only when the program hits the else case that I need to save anything onto the stack - this might not be true for all recursive functions, but it works here.

I must save $ra or I will get stuck in an infinite loop.  Every time fibonacci is called, $ra will be replaced with the start address of fibonacci.  We have to hold on to the address that points to main or we will lose it.  I also need a place to save the intermediate result of the first composite term (and the integer argument for a moment), so I reserve $s0.  Successive recursive calls will save my intermediate result for me because I have used correct register convention, saving $s0 before using it.  Of course, I load the old values back and restore the stack pointer before returning to the caller.

If you think about it, there is nothing special happening here that does not apply to other parent functions.  The only thing that is different is that I am "calling myself" instead of something else.  Register convention ensures that recursion does not introduce additional complications - if you use it correctly outside of recursive functions, you use it correctly inside recursive functions.

Frame Pointer

The frame pointer ($fp) is a special register used for pointing to different places on the stack.  The stack pointer always points to the top of the stack, so if you want to communicate the location of some data to a subroutine (function) then you need to pass that address with some other register than $sp.  Usually this will involve arguments - the frame pointer can be used to point to extra arguments that the caller placed on the stack.  Using an argument register to point to this region is wasteful, since there are only four of them.  Ideally all arguments would sit in registers for speed reasons, but register memory is expensive.  Having that extra room makes a difference.

If I have a function that takes 10 arguments, 4 of them can reside in registers while 6 of them will be on the stack.  The frame pointer will hold the address of the first of those 6:


On the caller side of things, the old frame pointer is first saved to the stack.  It is then modified to point to wherever on the stack I want it to, in this case with an offset of 4, because that is where my 6 arguments will go.  I load these arguments with $fp as my base address.  I then call the function that takes in these arguments, load the old value of the frame pointer back, and restore the stack.


As the callee, I can load the 6 extra arguments directly into temporary registers using the frame pointer as the base.  I can then do as I please with that information, and return to the caller when I'm done.

Really, the frame pointer is not all that mysterious.  It is just an extra register that you can use to point to a region on the stack other than its top.

Conclusion

We're done!  Well, done with the important material anyway.  There are other interesting things you can do as a side project.  MIPS syscalls in MARS support midi, so you could use it to play music if you wanted to.  But that is all supplemental, and if you have understood the core material you can learn the extra stuff with little effort.  Hopefully this tutorial has been helpful.

Saturday, January 14, 2017

Debugging in MARS

This is a topic that is kind of difficult to effectively teach on paper, but I'm going to go ahead and try my best.  The ability to debug your code in MARS is perhaps its most important feature, and it would be a shame if I were not to include a post on this on my blog.  Further, it is pretty inescapable - you will have to use it at some point whether you like it or not, otherwise you'll get stuck.  If you are/were anything like me in my freshman to sophomore years, you'd groan any time you'd have to reach for the debugger - fortunately, this debugger is pretty well designed and its functionality maps intuitively to how MIPS is executed.

Let me set the scene first:



I have this short program with a bug in it.  You might see it offhand, but the goal here is not to simulate a realistic scenario - instead it is to walk you through the process of fixing it.  Assume I have no idea what the problem is, and I'm completely baffled as to why I have an infinite loop here.  I've read over my code about ten times, and I can't find anything wrong.  Time to bring out the debugger.

When you build your program, you are automatically taken to the execution tab.  You will see something like this in the text segment of the screen:


This is your final program as the assembler built it.  All pseudo-instructions have been translated to real instructions, and we're looking at an interpretation of the machine code that was generated.  This is what MARS will run when we hit the play button.

The leftmost column is the breakpoint column.  A breakpoint is sort of an intervention into the running of your program that causes the machine to pause so that you can look at its state.  You can tell MARS to place a breakpoint at any line - just click the box to enable a breakpoint at that location.

The second column is the address column.  Most of the time you can ignore this column completely, but there may be times that you want to compare the contents of the $ra register to where you expected the program to jump to.  It just states where the binary instruction is located in memory when the program is running.

The third column shows you how your instructions are represented in binary.  I can't think of a situation where this would be all that useful unless you were writing the MIPS virtual machine yourself, which you're probably not planning on doing if you're using MARS.

The basic column shows you the instruction in assembly form, rather than binary form.  This is how you tell what part of your code you're looking at, in conjunction with the unnamed rightmost column, which tells you the line number in your assembly file that the instruction comes from and any comments you may have added.

You may notice that the labels factorial and exit are entirely absent.  This is somewhat unfortunate because you will usually want to set a breakpoint immediately following those labels.  The solution is just to put a comment that you can recognize on the first line underneath the label next to an instruction.  Lines that only contain comments won't show up at all.  I can tell that line 8 is where the factorial loop begins, because that's the instruction that exits the loop and I always put that instruction at the top.  You may prefer some other style whereby the first instruction in your loop does not look like this, but you'll recognize where a section starts with practice.

Since I have an infinite loop, I'm going to go ahead and put a breakpoint right at the top of that loop:


I can now hit the play button to run the program.  Lo and behold, my program pauses at the top of the loop:


I can tell where the program has stopped because that line is highlighted in yellow.

Now that the program has paused itself, I have several options.  I can run the program line by line on my command, I can run it really slowly, or I can keep pressing play and running into my breakpoint and just observe the machine's state once every time the loop cycles.  For the first option, I can use two of the buttons on the toolbar:


The third button from the left will move the machine's state forward one instruction, and the fourth will move the machines state backward one instruction (it will return to the previous state).  I can keep hitting this button until something looks off.

I could also play with the slider next to the toolbar:


By default this is set to the max setting, which just tells MARS to run the program as fast as possible (as fast as your computer can run it).  If I move the slider to the left, I can choose a specific number of instructions per second in the range of 1 - 30:


There's a bit of a quirk that can show up sometimes with this feature.  If you ever move the slider back to the right after changing its setting, MARS may end up running your program in slow motion anyway.  This is hardware dependent and happens only to some people.  If you run into this problem, just save your file and restart MARS.

For the third technique, all I have to do is keep pressing play and MARS will hit my breakpoint again, having gone through the loop one more time.  In this specific case, I prefer to use this tactic.

No matter which technique you decide is most user-friendly, you will have to say hello to the registers panel on the right side of your screen:


There are three tabs here but Coproc 1 and Coproc 0 are for advanced use only, so we'll stick to the Registers tab.  You have the whole family of registers at your disposal here.  In the case of this program, I was using $t0, $t1, and $t2, so I'll want to look at those.  Right now they contain the initial values that I gave them before the start of the loop, because MARS hasn't actually simulated one loop cycle yet.  If I hit the play button, the MARS will hit my breakpoint again and I'll see this:


The register that was last written to is highlighted in green.  Notice that the values stored in these registers still hasn't changed, despite having gone through one loop cycle.  That's not the expected behavior - I should see that the counter has been incremented, but it hasn't.  Hmm.

Oh!  I forgot to increment the counter!  Doy.


I added my increment at line 11, and the program is fixed!  Yay!

There is one more important feature that I didn't need to use, though.  My little program doesn't go to memory.  Sometimes I might want to look at memory, since the state of my program is dependent on it.  There is a screen for this underneath the breakpoint setting screen:


This shows you the entirety of main memory.  Every single byte.  The left column tells you the address in memory that the row begins at.  From left to right, the columns next to this one show the contents of several words in increasing address.  The column labels of these columns tell you the offset of each word from the address all the way to the left.  You will have to be familiar with endianness if you are looking for resolution at the byte level.  For integers in MIPS, the bigger bytes are placed first, so it looks exactly as you would write the integer on paper, but in hex.  You can toggle the hex off if you'd like with the convenient checkbox at the bottom, though.

Also useful is the drop-down menu that will allow you to travel between different areas in memory.  Because main memory is so big, this allows you to skip the process of scrolling down thousands of times.  Most of the time we're interested in the contents in the .data section, which is the default location anyway.

Like the registers tab, the word that was last written to will be highlighted while debugging:



In this case I've just written the word 15 to address 0x10010000.

And that's it.  If you have any questions or are running into trouble in ways that this tutorial hasn't addressed, feel free to leave a comment.  I will be checking my blog often, and will update this post with attribution if you run into an interesting problem.

Friday, January 13, 2017

MIPS: Millions of Instructions Per Second (Part 4)

In the last post I talked about functions and register conventions in MIPS.  I will continue the topic of functions in the next and final part, but we'll take a break from that and discuss bitwise operations and files.

 

Bitwise Operations

 

In MIPS and in most processor architectures, memory is byte-addressed.  Every byte is given its own number as a reference for accessing it.  But a byte consists of 8 bits.  This means that none of those bits have an explicit address, even though we may sometimes want to look at one bit individually.  To get around this problem, we can use bitwise operations.

MIPS provides the instructions and, or, xor, and nor, and their immediate counterparts.  You should be familiar with how these operations work on single-bit inputs.  On 32-bit registers, these instructions run 32 operations in parallel, one on each of the bits in the register.  So with two inputs, the far right bit of the first input is matched with the far right bit of the second input and put through the gate to get the result for the far right bit of the output.

For example, if we were to AND two 8-bit inputs:


All bitwise operations work on this simple principle.  There are uses for bitwise operations that come up a lot - let's go over those uses.

 

Isolating Bits With AND

 

This one is derived directly from the picture above.  There may be times when you want to observe a specific bit in a register, or a collection of bits - the simplest way to do this is to use andi with ones in the bit positions that you're looking at.  We will be using hex in our code - if you're not able to do the conversions in your head, I recommend practice but you may also use a calculator such as this one.

For example, if I want the lower 16 bits of a register, I can do this:


Anding with zeros sets the higher 16 bits to zero, while anding with ones ensures the lower 16 bits are kept the same.  This is equivalent to the following Java code:


Usually this sort of thing is done when dealing with bitfields, where each bit is a distinct boolean representing some condition.  Getting a nonzero result means the bit you're looking for is set, which means the condition is true.

Setting Bits With OR

On the other hand, you might want to write certain bits instead of reading them.  You can use the ori instruction to do this:


This will set the second bit from the right to one, and keep all other bits the same.  Usually this tactic is used in regards to bitfields as well, but in code that is setting conditions rather than checking for them.  This is equivalent to the following Java code:

Canceling Bits With AND

This is the opposite of the above - instead of setting a bit to one, you might want to set it to zero and keep all other bits the same.  You use andi for this too:


This sets the second bit from the right to zero, and is equivalent to:

Flipping Bits With XOR

Sometimes you just want to set certain bits to the opposite of what they are currently, regardless of what value they have.  For this, you can use xori with ones in the positions you want to flip:


This will flip the 13th bit from the right, while keeping all others the same, and is equivalent to:


And that's really it for bitwise operations.  They're pretty simple as long as you're able to do the binary to hex conversion.  Now let's move on to files.

 

Files

Java has abstractions for file operations that MIPS does not.  Instead of an object, each open file is given a number that we call its file descriptor.  All operations on that file will refer to this number so that we know we're dealing with it and not some other file.  If you've dealt with files in C, it is the same system just without the high-level code to simplify things.

All operations are done with syscalls.  Below is a table of the syscalls we'll be using:


Obviously the file has to be opened before we do anything with it.  Syscall 13 takes three arguments - the filename, a flag argument, and a mode.  Counter-intuitively, the "mode" in the traditional sense (read-only, write-only, etc) is sent through the flag argument, and mode is ignored entirely.  This usage is defined by the MARS environment, and may be different for other systems.  If we just want to read an existing file, we pass 0 for the flag.  For write-only mode, pass 1.  For write-only mode with automatic creation, pass 9.

The code below opens a file and prints its file descriptor:


MARS will assign file descriptors starting from 3, because 0, 1, and 2 are reserved for stdin, stdout, and stderr.  If your file is found, then 3 should be printed to the console.  If it isn't found, -1 will be printed instead.  The working directory for MARS is different depending on the operating system you're running it on - on Windows and Mac OS X, the working directory is the directory that the MARS jar is located.  On linux, the working directory is your home directory.

You can now use this file descriptor on the other three syscalls.  We can't write to the file because we've given 0 (read-only) for the flag argument, but we can read.  Syscall 14 accepts the file descriptor as its first argument, then the address of an input buffer to store the read data, then the number of bytes to read.  We need to set aside some space for the buffer in the .data section of our code in order to read:


$v0 will contain the number of characters read from the file.  If there are more than 16 left, it will read all 16 in this case.  If there are less, it will read however many bytes are left.  If it is already at the end of file, it will set $v0 to 0.  If there is an error, it will set $v0 to -1 or some other negative value.

There is no distinction between binary mode and text mode here.  If the data in the file is stored in ASCII, then it will load the ASCII data as is.  If it's not, it will load whatever else is in there in binary form and store it as is.  If we are dealing with an ASCII file and we print out the buffer as if it were a string...


...we will see the first 16 characters of its contents (also note that I forgot to use syscall 10 - don't do that).

Syscall 15 for writing is the same thing, just in the reverse direction.  It will read your buffer and write it to the file.  You can use flag 1 to write/replace, or you can use flag 9 to create a new file and write to it:


Since I use linux, the file showed up in my home directory and contained the string "this_buffer."

Finally, close the file using syscall 16:


This code just opens and immediately closes the file, but closing the file always works the same way.  Just put the file descriptor in $a0 and run the syscall.

To Be Continued

There's one last part!  Next time I'll be talking about recursion and the frame pointer.

Saturday, January 7, 2017

Big-O Notation

Welcome to the start of a series of posts on algorithms.  This series will be mostly theoretical and does not require the knowledge of a specific programming language, operating system, or whatever else is necessary for learning other areas of computer science.  A class on algorithms usually has the distinction of being exceptionally mathematical, so you will need to know basic calculus (derivatives / integrals) and some discrete mathematics (summations).  I will be using Python to write code excerpts, but all you will really need to recognize is the logic in them.

Many students consider theory their weak point, but I will be writing most of the material in plain English.  I will point out the meaning of every symbol and mathematical operation the first time I use them.  The point of this series is to make the topic more approachable and avoid the confusion of formality.

Without further ado, algorithms.

 

What is an Algorithm?

 

An algorithm is a technique to solve a problem that can be represented mathematically.  It is not necessarily executed on a computer - in fact, we execute many algorithms in our heads every day.  Remember back in elementary school when you learned how to add two numbers together?  That's an algorithm.  So is long division, or whatever technique you learned to subtract.  Most algorithms that humans execute are basic and involve few inputs, but algorithms can be used to solve much more complex problems with millions to billions (to trillions (to quadrillions, etc)) of inputs.

When designing an algorithm, there are two primary goals in mind - correctness and efficiency.  Correctness is the most important property of an algorithm.  If it isn't correct, then what good is it?  A wrong answer computed instantly is no better than a right answer computed in infinite time.  Of course, a right answer computed instantly is infinitely better than a right answer computed in infinite time.  Efficiency is the second most important goal, and it is what most of the effort spent on algorithm design is after.  An algorithm can be efficient in respect to multiple quantities, but most often in respect to time and space.

When talking about the efficiency of an algorithm, we say that the quantity we are analyzing changes in respect to the size of the input.  A sorting algorithm will take longer to complete on a large array than on a small array because of the increased number of comparisons it has to perform.  It also requires more space to store a larger array.  It is usually the case that the running time of an algorithm and the amount of space it uses is strictly increasing with bigger input sizes.  Because of the increasing demand for computing resources, computer scientists will go out of their way to get every drop of efficiency out of their algorithms, granted that efficiency remains relevant at scale.

 

Asymptotic Growth

 

It is entirely possible to analyze the precise number of operations an algorithm takes to complete, but often it is unnecessary to relay every detail to other computer scientists.  Instead, it is common practice to describe an algorithm using a small set of basic functions that each grow faster than each other.  There are multiple ways of doing this, but the most frequently used is Big O Notation.

Take some function, f(n), and some other function, g(n).  We say that f(n) = O(g(n)) if, for some constant c and some threshold n0:

$$f(n) \le c * g(n) \: \forall \: n > n_0$$

What does this mean?  It means that f(n) can be classified as O(g(n)) if g(n) is always bigger or equal to f(n) past some n.  In other words, f(n) grows just as fast or more slowly than g(n).  To clarify, the upside down A is just a fancy way of saying "for all."  You might be thinking: Can't I just make c really big, causing g(n) to become bigger than f(n)?  Well, no.  Because if f(n) is growing faster than g(n), then you can always choose a bigger n0 to undo all that work that c is doing.

Take, for instance, the function f(n) = n + n2.  Can we set g(n) = n, such that f(n) = O(g(n))?  If we set c to 1, then f(n) is only less than or equal to g(n) when n = 0.  Beyond that, f(n) is bigger.  Okay, how about setting c to 10?  f(n) becomes bigger when n = 3.  How about 100?  1000? 1,000,000?  There will always be some n such that f(n) overcomes g(n).  So we can't say that f(n) is O(g(n)).

Okay, well what about g(n) = 2n?  Nope, that won't work either.  That's the same thing as setting c to 2.  We have to multiply the original g(n) by something that grows with the size of the input, otherwise all we're doing is changing c.  We can multiply by any function of n that will enable g(n) to grow faster than f(n).  Since there's an n2 in there, how about setting g(n) = n2?

If we set c to 1, it is clear that f(n) will remain above g(n).  But if we set it to 2, then that's equivalent to g(n) = n2 + n2.  Clearly, the left hand side of the addition will overcome the n term in f(n), so f(n) = O(g(n)).  Finally.

So what's happening here?  In one simple figure, this:


Setting g(n) = n2 and c = 1 produces a function that will always be less than f(n) by n.  By setting c = 2, we produce a function that is always above f(n) past some point.  We can say that f(n) = O(n2) - we don't have to put the 2 in there, because c is just used in the proof.  In fact, putting the 2 in there doesn't really say anything about asymptotic growth, because...


...f(n) and g(n) are practically the same thing anyway.  This is what Big O Notation is actually saying, for the most part.  We could have also chosen g(n) = n3, or g(n) = n4 - both are an upper bound on f(n).  It just so happens that we found a function that perfectly describes f(n), which is also considered an upper bound on f(n).  If a function perfectly describes the asymptotic growth of another function, then it is necessarily true that one is Big O of the other.

It is fairly easy to prove that f(n) is O(g(n)) when it is actually the case.  All you have to choose is some c and some n0, and then show that the two functions cross (possibly at n = 0), such that g(n) becomes greater than f(n) with increasing n.  In fact, we could have avoided all this trouble and just picked out the biggest term in f(n) and removed any multiplied constant from that term (in this case there wasn't one).  That's usually how computer scientists choose g(n), because it becomes second nature to be able to point out the fastest growing term.

But what if we have to prove that f(n) is not g(n)?  When I set g(n) = n, I never actually proved that it didn't work, I just explained away that no constant would be acceptable.  In a formal setting (like a midterm), we have to show that there really is no c that will work.  There's one way of doing this that will work pretty much all the time and is convincing, but it involves L'Hospital's rule.  If the following is true:

$$\lim_{n \rightarrow a} {f(n) \over g(n)} = {0 \over 0} \: \wedge \: lim_{n \rightarrow a} {f(n) \over g(n)} = {\pm \infty \over \pm \infty}$$

Then:

$$\lim_{n \rightarrow a} {f(n) \over g(n)} = {f'(n) \over g'(n)}$$

Which is to say, if the limit as n approaches a for both functions is zero or that same limit goes to positive or negative infinity for both functions, then you can divide one function by the other and take the derivative of both to get the same limit.  For analysis of algorithms, we're always interested in setting a to infinity.

What we can do is divide one function by the other and keep taking the derivative until all reference to n disappears in either the numerator or the denominator.  If this happens for f(n) first, then f(n) = O(g(n)).  If this happens for g(n) first, then f(n) =/= O(g(n)).  If both disappear at the same time, then f(n) = O(g(n)) as well.  If we try this technique with f(n) = n + n2 and g(n) = n, then:

$$\lim_{n \rightarrow \infty} {f(n) \over g(n)} = {n + n^2 \over n} = {1 + 2n \over 1}$$

Because f(n) dominates g(n), f(n) =/= O(g(n)).  But setting g(n) = n2 will work:

$$\lim_{n \rightarrow \infty} {f(n) \over g(n)} = {n + n^2 \over n^2} = {1 + 2n \over 2n} = {2 \over 2}$$

Setting g(n) = n3 will also work:

$$\lim_{n \rightarrow \infty} {f(n) \over g(n)} = {n + n^2 \over n^3} = {1 + 2n \over 3n^2} = {2 \over 6n}$$

You can use this method to prove either case, as long as both of the derivatives go to infinity at every step.  If they don't, then you'll have to try and use algebra to get around the problem.

 

Common Functions

 

You can use any g(n) that you want as an upper bound, but most of the time the appropriate g(n) will be one from a list of common functions with increasing growth rates:
  1. O(1), also called constant.
  2. O(log(n)), also called logarithmic.
  3. O(n), also called linear.
  4. O(n log(n)), also called linearithmic or loglinear.
  5. O(nc), also called polynomial.  If c = 2, quadratic.  If c = 3, cubic.
  6. O(cn), also called exponential.  The most common base is 2, but c can be anything.
  7. O(n!), also called factorial.
It is standard to leave out the base when dealing with logarithmic terms, though the base does affect the growth rate of the function.  On the other hand, for polynomial and exponential functions, the choice of c absolutely matters.  An exponential function with a base of 3 grows much faster than an exponential function with a base of 2.  We can see how each function dominates the other with a figure:


As we extend the y-axis to cover a bigger range, the smaller functions begin to disappear:


Keep going, and the only one that remains visible is the factorial function:


This is the reason that Big O notation is used and focused upon in analysis of algorithms.  It is much more effective to increase efficiency by targeting the growth rate of the running time or space of an algorithm than to try to cut it down by some constant factor.  With large input, the gains of doing so completely disappear.  A decrease in running time from O(n3) to O(n2.93) is huge - a decrease in running time from 3n3 to 2n3 is minuscule.

 

Other Forms

 

So far we have focused on finding the upper bound of a function, but there are other statements we can make.  How about the lower bound?  We use Big Omega Notation to refer to the lower bound.  It is the same thing as Big O, just with a different symbol:

$$n^2 = \Omega(n)$$

The function g(n) = n is a lower bound on f(n) = n2.  The definition of the lower bound is almost identical to that of the upper bound, just with the equality flipped:

$$f(n) \ge c * g(n) \: \forall \: n > n_0$$

Instead of f(n) being underneath g(n) past some point, f(n) is above g(n) past some point.  Like upper bounds, f(n) and g(n) can have identical growth rates.  Proving a lower bound is done the same way as proving an upper bound.

If g(n) acts as both a lower bound and an upper bound on f(n) - that is, g(n) perfectly describes the asymptotic growth of f(n) - we use Big Theta Notation.

$$n^2 = \Theta(n^2)$$

All that needs to be done to prove Big Theta is to prove both Big O and Big Omega.  Big Theta notation makes the most valuable statement about the growth rate, because it is the most precise.  A very small function can be the lower bound of a very big function, and a very big function can be the upper bound of a very small function - but a very big function can't be both an upper bound and lower bound on a very small function and vice versa.

Lastly we have Little O and Little Omega.  The difference between Little O and Big O is that g(n) can't have an equal asymptotic behavior to f(n) - it must be larger.  Same thing with Little Omega - it must be smaller.  The definition of Little O is:

$$f(n) \le c * g(n) \: \forall \: n > n_0, c > 0$$

Here we don't get to choose a constant to "boost" g(n), it must remain greater than f(n) past some point on its own, no matter what constant is chosen.  Likewise, for Little Omega:

$$f(n) \ge c * g(n) \: \forall \: n > n_0, c > 0$$

Both of these are making stronger statements, because the set of functions that can be chosen for g(n) is smaller.  We can say that:

$$n^2 = o(n^3)$$

And:

$$n^3 = \omega(n^2)$$

Because n3 grows faster than n2, and n2 grows slower than n3.

 

To Be Continued

 

That's basically it for the formal definition of asymptotic notation.  In the next part, I will apply asymptotic analysis to sorting algorithms.

Friday, January 6, 2017

MIPS: Millions of Instructions Per Second (Part 3)

In the last post, I covered branching logic and memory operations.  In this post I will address the use of functions and proper register convention.

 

Functions

 

Java gives the programmer the luxury of methods, which mainly avoid the problem of copying the same code all over your program, among other things.  Procedural languages, such as C, have no classes and instead use functions (also known as procedures).  The difference between a method and a function is usually deferred to a simple question: Is it in class?  If so, it's a method.  Otherwise, it's a function.

C compiles down to assembly code, and MIPS is one of the platforms that can be targeted.  So MIPS is able to provide the same functionality, just in a different form.  In MIPS, a function is an area in your code beginning with a label and ending with a jump back to the location where the function was called.  A function may accept arguments, which act as inputs, and may return data as output.

Let's write a function that calculates the factorial of a number:


We're calculating the factorial in a slightly different way here, more equivalent to the following Java code:


On lines 13 and 14 I have commented the inputs and outputs of the function and which registers they use.  MIPS has four a registers which are meant to store the first four arguments to a function.  Up to this point we have used them as inputs to syscalls, which are, in effect, functions.  MIPS also has two v registers which are meant to store the return values of a function.  That's right - there's two.  You can return two values at the same time, one in $v0 and one in $v1.  Of course, there are ways of returning more things and accepting more inputs - I'll address that in a minute.

A minor point, but you will notice that line 22 uses the multu instruction instead of a regular mult.  This means multiply unsigned, and it treats the two input registers as unsigned integers rather than signed 2's complement integers.  Most instructions have an unsigned counterpart, and are distinguished by a u at the end of their abbreviated name.


On lines 27 and 28, I copy the factorial into $v0 to return and use the jr instruction, which means jump register.  This instruction jumps to a variable address provided by any register, but usually $ra.  There's a significance to this register.


From the entry point of the program at the top of the file, we call the function by using the jal instruction, which stands for jump and link.  This is a pseudo-instruction which compiles to two regular instructions.  First, the address of the line 6 is copied into $ra - this is why $ra is significant.  Then a jump is performed to the specified label.  When the function returns, the program will continue from the instruction immediately following the jal.

 

Register Convention

 

So that's pretty much all functions are.  The way they work is very simple, but their very existence opens up a certain can of worms.  In Java when we call a method we expect all of our local variables to be left alone - we expect the method not to screw with them.  The same thing needs to happen in MIPS, otherwise our program would have bugs everywhere.  There's a problem though - what if I'm using $t0 for something but the function I want to call also uses $t0?  Unless I save it somewhere, the variable I put in $t0 is going to be overwritten.  It will effectively disappear.

To get around this problem, programming teams must have an agreed-upon convention that will avoid data from being stepped on.  We call this register convention.  In using it, all programmers agree to use registers in a specific manner that is predictable, sort of like driving on public roads.  By agreeing not to drive on the wrong side of the road, we avoid head-on collisions.  Likewise, by agreeing to use registers in a certain way, we avoid bugs.

There are four types of registers:

  1. Argument (a) registers [$a0 - $a3]
  2. Temporary (t) registers [$t0 - $t9]
  3. Return (v) registers [$v0 - $v1]
  4. Saved (s) registers [$s0 - $s7]

We have used all but #4 on this list so far.

Argument registers are, as discussed, used to provide arguments to functions and syscalls.  When they are handed to a function, that function is given ownership of them.  That means that there is no guarantee that a function will return with the same value stored in $a0 as before it was called - it can overwrite all of them at will without saving them anywhere.

Temporary registers are not used to provide arguments in any circumstances, but like argument registers there is no guarantee that a function will not overwrite their contents.  They are the most commonly used, hence the fact that there are 10 of them.

Return registers are used to store the output of a function, and also to tell the processor which syscall to use.  Just like the previous two types, they can be overwritten by functions.

Saved registers are unlike the other three.  They are used in the same way as temporary registers in that they store intermediate results of calculations, but they can never be overwritten by functions.  If a function wants to use a saved register it must save its original value somewhere and load it back before returning to the caller.  We say that it is the callee's responsibility to ensure continuity, that saved registers remain unchanged.  For the other three types, it is the caller's responsibility to maintain continuity.

You might be wondering where to save the registers.  It is most common to saved them on the stack, an area in main memory with a FIFO growth policy.  If you've taken a course on data structures you should know what a stack is and how it works.  In the case of MIPS, the stack grows downward from an address set at compile time, and shrinks upward - that is, the "top" of the stack begins at a higher address and moves to a lower address as the stack gets bigger.

The top of the stack is pointed to by $sp (the stack pointer).  This register must only be used for this purpose, otherwise your program will explode.  When a function wants to allocate space on the stack, it subtracts the number of bytes it needs from $sp and then saves its data to that region.  When that function returns, the stack pointer is added back to where it began.

Enough talk, let's put all this into practice by modifying the factorial program.


The entry point of the program has two variables that it wants saved between functions calls, one in $s0 and one in $t0.  Because $s0 is a saved register, it doesn't have to do anything to ensure that it remains the same after factorial returns.  However, factorial might use $t0, so it has to save its contents to the stack.  On line 7, it asks for 4 bytes by subtracting 4 from $sp, and then saves $t0 with an offset of 0.  It calls the function on line 11, and then loads $t0 back from the stack on line 14.  It subsequently resets the stack pointer by adding 4, the same amount that it previously subtracted.


Within the factorial function we've switched from using $t0 to using $s0.  Even though we now know that it isn't using $t0 anymore, it is still convention to save $t0 in the entry point anyway.  Think of it like this - the entry point and the factorial function are two separate parts of your program that might be modified by different people.  If one person is working on factorial and you are working on the entry point, what if your teammate decides to use $t0 later on after all?  You would have a bug in your program.  By following convention to the book you avoid these situations.

Before factorial uses $s0, it asks for 4 bytes on the stack and saves it there.  It is then free to use $s0 as it pleases, as long as it loads the original value back.  It does so at line 50, so it is practicing good register convention.  It does not have to do this with $t1 or $a0, despite using both.

 

Parent Functions

 

Sometimes a function will need to call another function to complete its task.  I use the loose terminology parent function to refer to functions that call other functions.  Functions that do not call other functions are sometimes called leaf functions.  The factorial function we just wrote is an example of a leaf function.

Parent functions need to take one additional step before calling other functions.  Because the return address is always stored in $ra, a call to jal will overwrite this register.  A parent function must save $ra to the stack and load it back right before returning if it wants to jump back to the right place.  To show what I mean, take the function two_factorials which adds the factorials of two input numbers together:


In addition to saving $s0 before using it, two_factorials saves the return register so that it can reload it later.  This can be seen on lines 64 and 78.  Note that because we need the space for 2 registers, we ask for 8 bytes on the stack.

 

To Be Continued

 

In the next part I will cover bitwise operations and files.