Saturday, January 28, 2017

Integer Number Systems

The most common way to write a number is in the decimal system.  In this system, there are ten different discrete digits used to represent value.  It is therefore a base-10 counting system, because the number of symbols used is ten.  You could write that same value, however, using a different number of symbols as your alphabet - e.g., you could write it in base-2, base-3, and so on.  This post will discuss ways in which to represent numbers, as well as how numbers are stored on a computer.

 

Counting Systems

 

Integer numbers are mainly used to describe the size of discrete sets.  If you have two apples, the size of the set of the apples that you have is two.  If you wanted to communicate the set's size, you could just point at the collection of apples.  Once you get past a certain size, however, it becomes difficult for humans to discern one count from another.  Imagine trying to determine if there were 100 items in a set instead of 101 just by looking at the set - you would have to systematically look at each item individually and mark the ones you've already counted, otherwise you would lose track pretty quickly.

Your set, just by existing, exhibits a base-1 counting system.  If your set consists of apples, you could think of an apple as a symbol representing itself.  Line up the apples next to each other, and you have a base-1 number.  If you wanted to write down the size of your set in base-1, though, it would be time consuming to draw an apple for every apple that you have.  You could instead represent each apple with a 0, and write a line of zeros.  But this, too, is more time consuming than it needs to be.

You could dramatically reduce the number of symbols you have to write by moving up to a base-2 counting system.  Instead of a zero for every apple, you represent the count using a string of ones and zeros.  To reduce it even further, move up to base-3.  Further, base-4.  Etc.  We use base-10 the most, supposedly because we have ten fingers.

There is a diminishing return to increasing the base in regards to the amount of space required to represent a value.  To represent up to four billion values, you only need 32 digits in a base-2 number system.  You can represent the same number of values in base-2 as base-1 with millions of times less space.  You can represent the same number of values in base-3 with 14 digits, only a little less than half the space as base-2.  For this reason (and several others), classical computers - like the one you are using to read this post - use a base-2 counting system, or binary.  There is little need to increase the base, as the gains in doing so are vanishingly small.

Computer scientists often pick from a set of the same four counting systems to represent numbers: decimal, binary, hexadecimal, and octal.  Decimal is used because it is what most people read.  Binary, of course, because the computer uses it.  Hexadecimal (base-16) to write fewer digits, and octal (base-8) for the same reason.  You'll notice that hexadecimal and octal have bases that are powers of two - there's a reason for this, which I will explain.

 

Conversions

 

It is a common task to have to convert between one number system and another.  The method (algorithm!) that applies the most is the multiply and divide method.  Here, you first convert the number to decimal by multiplying by powers of the base, then continuously dividing by the new base and taking the remainder each time to form the new representation.

The process of multiplication into decimal can be seen below:


From right to left, multiply each digit by an increasing power of two, and then add the result.  Normally we write the base as a subscript to the number when not using decimal to avoid confusion when it is not explicitly stated what number system is being used.

Now that we have the decimal value, we can fully convert using the division algorithm:


Keep dividing by the base (in this case, two again) and writing down the remainder.  Once the quotient is zero, you're done.  The final value is read from bottom to top.

This method is more time-consuming than it needs to be when converting between bases of powers of two.  If I am converting from binary to hexadecimal or octal, I can group digits together in sets that form new digits in the new system:


When converting from binary to hexadecimal, we group four binary digits together.  We then translate it via a "lookup table" (you could probably memorize this table with effort).  The same thing applies to octal, except we group three digits instead of four:


This is even easier to accomplish because there are fewer entries in the table.


Computers and Binary


So far we've dealt only with positive numbers.  But negative numbers need to be represented by a computer, too.  In only representing positive numbers, we use an unsigned binary system.  To represent negative numbers in addition to positive numbers, we have several choices.  We could treat one of the bits as the sign of the number - this is referred to as signed magnitude.  All but one of the bits form an unsigned binary number, while the leftmost bit encodes the sign (zero for positive, one for negative).


When taking a negative decimal number and converting it to a signed magnitude binary number, use the same method as with unsigned binary to start, then add a one to the left end of the number.  Since computers work with a fixed amount of space (in the case of MIPS, 32 bits for every register), we must sacrifice one bit to encode the sign (so that 31 bits remain for the magnitude).  If the number is not negative, then the leftmost bit is set to zero while the remaining bits represent the unsigned magnitude of the number.

There is one flaw with signed magnitude - there are two representations of zero.  There is a positive zero and a negative zero (+0, -0).  This is effectively meaningless, and there is really no reason we would want to have both a positive and negative zero.  As a response, computer scientists came up with two's complement representation.

But before we talk about that, we need to discuss one's complement, which is directly related.  Like signed magnitude, one's complement has two zeros.  Unlike signed magnitude, we don't simply treat the left bit as a sign bit.  Positive numbers are represented in the same way as in signed magnitude - the leftmost bit is zero, while the remaining bits encode the number in unsigned binary.  But for negative numbers, we take the positive equivalent and flip all the bits:


Here I have a fixed-size 8-bit register in which I represent my number.  To go from decimal to one's complement, we again compute the unsigned representation like before.  Since I'm using a fixed-size representation, I "append" additional zeros to the left of the number.  Because the number is negative, I then flip all the bits.  So 1100 gets turned into 00001100, which gets converted to 11110011.

While we do this extra work with the lower bits to represent negative numbers, it is still trivial to tell whether a one's complement number is negative or not: look at the leftmost bit.  The leftmost bit tells you the same information in one's complement as it does in signed magnitude, even though it does not act alone in encoding negative numbers.

As mentioned, there are two representations of zero: with 8 bits, 00000000 and 11111111.  Two's complement finally addresses this by mapping negative numbers slightly differently - to get the negative representation of a number, add 1 to the one's complement representation.


Here we take the unsigned representation of 40 (101000), add zeros to get a fixed-size number (00101000), flip all the bits (11010111), then add 1 (11011000).  Like one's complement and signed magnitude, the leftmost bit can be relied upon to determine whether the number is negative or not.

Nearly every computer uses two's complement natively to represent signed integers.  This is certainly the case for MIPS processors.  Unsigned binary can be used to represent unsigned integers, just as long as special instructions are made available that treat their input as unsigned.

No comments:

Post a Comment