Maths Ball

My son is learning his times tables so to help him out I decided to write a simple game in python. I call it Maths Ball. It’s a 2 player game where the idea is to score 3 goals to win.

The game starts with a ball heading towards player 1′s goal. A maths question appears at the top of the screen along with 4 possible answers. Player 1 needs to answer the question before the ball reaches their goal. If they choose an incorrect answer, they concede a goal. If they choose the correct answer, the ball starts heading towards player 2′s goal. The quicker they answer the question the less time the other player will have to respond. It’s very effective in helping kids learn their times tables because they get to answer a lot of questions in a small amount of time.

I’ve put the code for Maths Ball up on github. If you are a python programmer and want to help make it better I’d love you to help. I have the following ideas that would make the game better:

  • Add an option to play against the computer
  • Allow players to enter their names
  • Allow players to choose a level which controls the speed of the ball
  • Allow players to choose multiplication, division, subtraction or addition
  • Store more stats for each player like questions answered correctly and incorrectly
  • Improve the graphics for the splash screen (function splashScreen)
  • Improve the graphics for the winning screen (function winningScreen)
  • Improve the graphics for the final screen displaying statistics (function showStats)
  • Improve the graphics for the goal scored screen (function showGoal)

Hash Tables

What is a hash table?

In simple terms, a hash table is a data structure that can map keys to values. The blog Five Myths about Hash Tables has a simple explanation of how searching works with this data structure.

I’ve taken some code from The C Programming Language and created a program that uses a hash table to store key/value pairs and then to lookup a specific key and retrieve it’s value.

An important consideration for any implementation is the hash function to use. For this program I’ve used the one from the book which has this disclaimer; it’s not the best possible hash function but it is short and effective.

/* hash: form hash value for string s */
unsigned hash(char *s)
{
    unsigned hashval;
    for (hashval = 0; *s != '\0'; s++)
        hashval = *s + 31 * hashval;
    return hashval % HASHSIZE;
}

You can see that the modulus operator ensures that the function returns an integer between 0 and HASHSIZE-1. As you might imagine, certain keys will map to the same hash value. To solve this problem, we store a linked list of keys and values at each element in the array. This means that when installing a key/value pair or when searching for a particular key, we have to perform the hash, and then walk through the linked list until we either find what we are looking for or reach the end of the linked list. Here’s the code for the lookup function:

struct word *lookup(char *s) {
    struct word *nw;
    for (nw = wordtab[hash(s)]; nw != NULL; nw = nw->next) {
         if (strcmp(s, nw->name) == 0) {
             return nw; /* found */
         }
     }
     return NULL; /* not found */
}

The function to install is very similar. We first call the lookup function to see if the key is already in the hash table. If it is we update the value, if not we create a new entry. That pretty much sums up the basics of using a hash table. Next time you need to store some key/value mappings, consider whether a hash table would be appropriate.

 

 

Variable Byte Integer Encoding

Before proceeding any further, read Compression, Speed, and Search Engines. This explains what variable byte encoding is and why it’s useful.

The Code

Here’s one implementation of variable byte encoding by Hugh Williams. When I first looked at this a while ago I didn’t really understand what was going on. Now I’ve been writing some code again I’ve finally figured it out.

This version of variable byte encoding assumes that an integer is 4 bytes long. Since we are using 1 bit from each of the 4 bytes to represent whether this is the last byte for the integer, we lose 4 bits for actually storing the integer. To work out the largest number this code can store turns out to be pretty easy. Since we are using 28 bits (4 bytes * 7 bits) to store the integer, the largest number is (2^28)-1 = 268,435,455.

The process of writing out the encoded integer is broken up into 2 stages.

Stage 1 – Get the binary representation for each of the 4 bytes

for(x=0;x<4;x++)
{
    tmp = (number%128) << 1;
    bytearray[x] = tmp;
    number /= 128;
}

The first stage effectively gets the binary representation for each of the 4 bytes, one by one. As we are using 7 bits for storing the integer, the maximum number we can store in a single byte is 127. Hence we take the integer to be encoded and get the modulus 127. We also need to left shift the bits by 1 place to make room for the indicator bit. Note that in this first stage we don’t bother setting this indicator. Once we’ve stored this byte, we use integer division and divide by 128. This is because we want to store the binary representation of the number which when concatenated with the other bytes will effectively multiply the second byte by 128, the third byte by 16384 (128*128), and the fourth byte by 2097152(128*128*128). During this process if the integer division returns 0, then the remaining bytes are all set to 0.

Stage 2 – Write the integer to a file

for(x=3;x>0;x--)
{
    if (bytearray[x] != 0 || started == 1)
    {
        started = 1;
        bytearray[x] |= 0x1;
        fwrite(&bytearray[x], sizeof(char), 1, fp);
        charswritten++;
    }
}
bytearray[0] |= 0x0;
fwrite(&bytearray[0], sizeof(char), 1, fp);
charswritten++;

The second stage actually writes out the 4 bytes. Well in fact, and this is the point of the compression, if the number is small enough it won’t require 4 bytes. The first byte is the only one which is always required. As the number gets larger, more bytes are needed. With this scheme one byte can store numbers up to 127, 2 bytes up to 16,383, 3 bytes up to 2,097,151, and 4 bytes up to 268,435,455.

We first loop over the last 3 (most significant) bytes. The loop checks to see if each byte is 0 and we haven’t started writing the actual number yet. This ensures we don’t write any bytes that aren’t required but we also write a 0 if it is part of a larger number. Before we actually write the number we perform a bitwise or with 0×1 to set the indicator bit to 1, this is required so that the read function knows this is not the last byte of the number. Once this loop has completed we write out the mandatory first byte. This time we perform a bitwise or with the value of 0×0 to set the indicator bit to 0. This will be the last byte. So that’s it for the write part of the code.

Reading a Variable Byte Encoded Integer

Now that we understand how to write an encoded integer, the process of reading it is fairly straight forward.

int vbyteread(FILE *fp)
{
    char tmp = 0x1;
    int val = 0;

    while((tmp & 0x1) == 0x1)
    {
        if (fread(&tmp, sizeof(char), 1, fp) == 0)
        {
            if (feof(fp))
                return(-1);
            else
                return(0);
        }
        val = (val << 7) + ((tmp >> 1) & 127);
    }
    return(val);
}

 While we have another byte to read (the indicator bit is 1), which we can tell by peforming a bitwise and with 0×1 and checking it equals 0×1, we multiply the current value by 128 (left shift 7 bits), read the byte and right shift it by 1 to get rid of the indicator bit and then bitwise and it with 127 to ensure the eighth bit is set to 0, and then add this to value. This all gets accomplished with the single line:

val = (val << 7) + ((tmp >> 1) & 127);

Maximum Number Using This Code

Here’s a program using the variable byte integer encoding code that shows that the maximum number we can encode is 268,435,455. When we use a number bigger than this the values for the first bytes are still calculated, but any parts bigger are ignored. Hence 268,435,456 will be read back as 0.

Speed Test

When I first tested the speed of using the variable byte integer encoding it seemed to be slower than simply reading uncompressed integers. However if you don’t use enough data, the numbers can be unreliable. Once I tried using a much larger sample of data, the variable byte code proved to be significantly faster.

Here’s a program to write 10 files written with compression, and 10 without. You can compare the file sizes generated by this program and see that the compressed files are around 50% smaller than the uncompressed files. Out of interest, 1,638,400,000 integers are written over the 10 files. To me that’s a lot of integers!

Here’s a program that reads the 20 files and compares the times for reading and uncompressing the compressed files versus reading the uncompressed files. The results from my test were:

Uncompressed Files:     13 minutes 35 seconds
Compressed Files:       09 minutes 13 seconds

This shows that it’s much quicker to read the smaller encoded file and perform the decoding, than to simply read a larger uncompressed file.

Structures and Binary Trees

I know I go on about how good this book is, but if you are interested in programming and haven’t read The C Programming Language, get yourself a copy and go through it in detail. Take your time to really understand all the lessons it presents. Below I’ll discuss a program from this book that shows how to count the number of occurrences of each word in some input.

Counting Words

To count the number for words in some input, we need a convenient way of storing each word as we find it, so we can quickly access it again to increment the count the next time it is presented. One way is to use a data structure known as a binary tree. If you’ve not heard of a binary tree, it’s kind of like a family tree. We call each element in the tree a node. It has a root which is the starting point. With a binary tree, each node can only have 0, 1 or 2 children. For our program each node in the tree will contain:

  • a pointer to the word
  • a count indicating how many times we’ve seen the word
  • a pointer to the left child node
  • a pointer to the right child node

To allow us to access each word quickly, we maintain the tree so that the left subtree of a node contains only words that are less than the word stored by the current node, and the right subtree contains only words that are greater than the word in the current node. Thus to search for a particular word, we start at the root and check if this node contains the word we are looking for. If so great, we are done. If the word we are looking for is less than the word, we search the left subtree, if it’s greater, we search the right subtree. If we get to a node with no more children, then the word is not in our tree and we can add it as a child of the last node we searched.

We’ll use a C structure to represent each node in the tree.

struct wordnode {
    char *word;             /* points to the word */
    int count;              /* occurrences */
    struct wordnode *left;  /*left child */
    struct wordnode *right; /*right child */
};

With our structures in place, the main function of the code will be very short. Here’s an outline of the program.

While get-the-next-word
    add-word-to-tree
print out the tree

We’ll use a primitive function to read the standard input and return the next word. The beauty about using a function is we can always improve the code later on without affecting the rest of theprogram.

We’ve already discussed how to add a word to the tree. The next word is presented to the tree and tested against the root node. It  matches or is past to either the left or right subtree until a match is found or the bottom of the tree is reached where a new node is added. Note that to add a new node, we first need to allocate memory for that node using malloc.

It’s important to note that if our input comes already sorted, then this method will essentially be a linear search. Each new word will be compared against every other word already stored in the tree. If the input is random then only a few comparisons will be required for each new word.

Printing the Tree

The structure of a tree makes printing it out very easy. We write a function that takes a node and checks if it is NULL. If so that is the end of the function. If not we recursively call the function for the left subtree, print out the current word and count, and then recursively call the function for the right subtree.

void printwords(struct wordnode *wn) {
    if (wn != NULL) {
        printwords(wn->left);
        printf("%4d %s\n", wn->count, wn->word);
        printwords(wn->right);
    }
}

As an experiment, test out placing the printf statement before and after the recursive call to the printwords function and see what happens.

Grab the word counting program now and have a play!

 

Pointers, Function Parameters, and Arrays

If you are going to become a good C programmer, you need to understand how pointers, arrays and functions all work together. So again to improve my understanding I reached for my trusted book The C Programming Language.

What is a Pointer?

A pointer is itself a variable but instead of holding a value such as an integer, it holds the address of another variable. The * operator lets you access the variable the pointer points to. You also use a * to declare a pointer. Since a pointer points to a variable and different types of variables have different sizes, you need to declare a pointer to point at a specific type of variable. The & operator gives the address of a normal variable. This means that if we declare an integer i, and a pointer to an integer p, we can set p to point to i with the following code:

int i = 5;
int *p;
p = &i; /* Make p point to i */

Here’s a really short program using pointers that you can play around with and get used to the syntax.

Function Parameters

C passes arguments to functions by value, which means that a copy of the variable is passed rather than the variable itself. To see what I mean, take a look at this program to calculate 5 factorial. The variable x is passed to the function and behind the scenes a new copy is made. This copy is also referred to as x but it is a different x. The copy of x is decremented in the function but once we return to the main program, the variable x still retains it’s original value. Hence the printf statement produces “5 factorial = 120″.

If you actually want a function to modify a variable, you need to pass a pointer to that variable. That way the function makes a copy of the pointer, but this copy still points to the variable you want to modify. This swap routine shows how it can be done.

Arrays and Pointers

When you declare an array, for example int arr[10], a contiguous block of memory is set aside that can hold all 10 integers. You can then reference these integers usings arr[0] through to arr[9]. Now if you declare a pointer to an integer (int *p) then you can make this pointer point to any of the 10 integers declared by the array. p = &arr[0] makes p point to the first element in the array. Now that p points to the first element, you can change arr[0] by assigning a value to the pointer p. For example, p=10 will set arr[0] to 10. What is more interesting is that because the array is set up as a contiguous block of memory, you can actually use arithmetic on p to point to different elements of the array.  p+1 points to the next element after p. p+5 points to the 5th element past p. Thus if p points to arr[0], *(p+1) refers to the contents of arr[1] and p+1 is the address of arr[1].

When an array name is passed to a function, it is the location of the first element that is actually passed. This makes it possible to pass only part of an array to a function. If you have a function f, you could call f(&arr[2]) or f(a+2) and the part of the array starting at arr[2] would be passed to the function.

Multi-Dimensional Arrays

Take a look at the 2 following declarations:

int arr[10][20];
int *b[10];

The first declaration sets aside space for 200 integers. The second declaration allocates space for 10 pointers but they are not initialised. The big advantage of the second declaration is that each of the 10 pointers can hold different length arrays. This can be particularly useful for storing character strings of varying lengths.

 

 

Data Types in C

Basic Data Types

Interestingly, there are only a few basic data types in C.

  • char – a single byte, capable of holding one character in the local character set.
  • int – an integer whose size is machine dependent but 4 bytes on my local machine
  • float – single precision floating point
  • double – double precision floating point

Certain qualifiers can be applied to these basic types. For int, short and long can be applied. As you might expect from the names, short uses less space and long uses more. Again the exact size is machine dependent. On my machine a short int takes 2 bytes, int and long both use 4. You can use the sizeof function to tell the length. Here’s a quick program to check the sizes of the various integers on your machine. )

The qualifier signed or unsigned can be applied to char, short int, int, and long int. If a variable is declared as unsigned, it must be positive. If a character is 1 byte, then an unsigned character can have values between 0 and 255, and a signed character can have values between -128 and 127.

There is also a limits.h file defined by the ANSI standard which defines constants for the sizes of integral types. I’ve just updated the intsize.c program to also display these.

More on Characters

The value of a character is stored as the numeric value of the character in your machine’s character set. For example in ASCII the character ‘A’, is represented represented by the number 65. Compile and run this program to see what I’m talking about. Try setting c to different values and see what happens.

To further emphasise this point, lets take a look at a modified version of a program from The C Programming Language book to count the number of digits. In this digit counting program, an array of 10 integers is set up, and each time a digit between 0 and 9 is found, the relevant entry in the array is updated. Although this sounds quite simple, the implementation takes a bit of understanding.

while ((c = getchar()) != EOF)
    if (c >= '0' && c <= '9')
        ++ndigit[c-'0'];

We get a character from standard input and store it in the character variable c. If c is a number between 0 and 9 we increment the array element represented by c – ’0′. The expression c – ’0′ is evaluated by taking the integer value of c and subtracting the integer value of ’0′. This means that for the 10 digits we have the following expressions:

Where c = 0 -> c - '0' = 48 - 48 = 0
Where c = 1 -> c - '0' = 49 - 48 = 1
Where c = 2 -> c - '0' = 50 - 48 = 2
Where c = 3 -> c - '0' = 51 - 48 = 3
Where c = 4 -> c - '0' = 52 - 48 = 4
Where c = 5 -> c - '0' = 53 - 48 = 5
Where c = 6 -> c - '0' = 54 - 48 = 6
Where c = 7 -> c - '0' = 55 - 48 = 7
Where c = 8 -> c - '0' = 56 - 48 = 8
Where c = 9 -> c - '0' = 57 - 48 = 9

If you run this program against itself, you get “digits = 8 3 0 0 0 0 0 0 0 1″.

What About Strings?

If those are all the basic data types, how do you represent a word or a sentence in C? I’m glad you asked! In C a string is an array of characters. In order for programs to tell how long a string is, the last character in the array should be the null character ‘\0′. Hence in terms of storage, a string requires one more character than the length of the string. For example the string “back to programming” takes 20 bytes to store even though it is 19 characters long. Here’s a program showing a basic way of determining the length of a string. It takes a string (an array of characters) as an argument. It then looks at the first element in the array to see if it is the null character. If not it increments the count of characters and looks a the next character. Pretty simply huh? This is pretty much how the standard library function strlen works. Once you understand that a string is an array of characers, it makes working with strings straight forward.

 

Hello World

I recently read the blog post Compression, Speed and Search Engines. As I read the article, I struggled to understand some of the data structures mentioned and it made me think just how much I’d forgotten since I completed by degree in 1995. I opened up the code mentioned on variable byte encoding and couldn’t understand what it was doing. Something needed to be done!

My first attempt was to look at some old books I still had from University. First off the ranks, algorithms in C by Robert Sedgewick. Again this was too complicated right off the bat. I needed to go right back to the start. This lead me to The C Programming Language. I’d forgotten just how good this book is. Although it is only 295 pages long, it takes you from the basics right through to complicated programs. In this blog I’ll be talking about some of these lessons along with other interesting projects and programs that take my interest.

To get you started check out the very first program you need to write when learning a new language – hello world. To compile this and run it on my Linux system I had to use the following command.

gcc hello.c

This creates an executable file called a.out. By running ./a.out I get the following output:

Hello, world

Great, now we’re at a point where we can start to learn! If you were once a programmer and have forgotten too much, I’m hoping you’ll take this journey with me on getting back to programming.