Code repair 2: Fibonacci and standard deviations homework (part 1)

I saw this code posted over on codereview.se from someone looking for code review of their numerical code homework. Given that I particularly enjoy numerical code I bookmarked it with this series in mind. This piece of code seems perfect for the code review series because in the course of improving it a broad range of different concepts will be highlighted.

The poster appeared to be a new programmer enrolled in a c++ course. I think the attitude to learning more via review was really good, they attempted to write and compile their code first then asked for review after. (If the author of the code happens to read this I'd like to say "good job! keep learning"). There's still a lot to improve on here before it could be said to be of a high quality, so I'm going to outline how to improve this code step-by-step.

Unlike the problems with rand() article the code here is a bit too big to just tackle in an ad hoc fashion. I'm going to keep track of the code in Git and in case anyone's interested the repository is hosted up on my github page: code repair series repo

The specification:

  • Define an array of 100 sequential numbers of type float.
  • Define an array of 20 Fibonacci numbers of type int.
  • Use an overloaded function to compute the mean of the array containing 100 numbers.
  • Use an overloaded function to compute the standard deviation of an array containing 100 numbers.
  • Use an overloaded function to compute the mean of the array containing 20 Fibonacci numbers.
  • Use an overloaded function to compute the standard deviation of the array containing 20 Fibonacci numbers.

The code

#include "stdafx.h"
#include <iostream>
#include <cmath>
using namespace std;

const int SIZE_OF_GENERIC_ARRAY = 100;
const int SIZE_OF_FIBONACCI_ARRAY = 20;

double arithmeticAverage;
char sequenceType;

float Array[SIZE_OF_GENERIC_ARRAY];
int Fibonacci[SIZE_OF_FIBONACCI_ARRAY];

void fillGenericArray(int SIZE_OF_GENERIC_ARRAY);
void fillFibonacciArray(int SIZE_OF_FIBONACCI_ARRAY);

double mean(float[], int);
double mean(int[], int);

double standardDeviation(float[], int);
double standardDeviation(int[], int);

void outputMean();
void outputStandardDeviation();

int _tmain(int argc, _TCHAR* argv[])
{
   cout << "Would you like to generate a generic sequence or a Fibonacci sequence?"
       << endl
       << "\n"
       << "Type [G] + [ENTER] for a generic sequence, or;" << endl
       << "Type [F] + [ENTER] for a Fibonacci sequence: ";
   cin >> sequenceType;

   if (sequenceType == 'G' || sequenceType == 'g')
   {
       fillGenericArray(SIZE_OF_GENERIC_ARRAY);
       outputMean();
       outputStandardDeviation();
   }

   else if (sequenceType == 'F' || sequenceType == 'f')
   {
       fillFibonacciArray(SIZE_OF_FIBONACCI_ARRAY);
       outputMean();
       outputStandardDeviation();
   }

   else
       cout << "\n"
            << "Invalid input. Please type 'F' or 'G'. Thank you.";

   return 0;
}

void fillGenericArray(int SIZE_OF_GENERIC_ARRAY)
{
   int i = 0;

   Array[0] = { 1 };

   for (int i = 0; i < SIZE_OF_GENERIC_ARRAY; i++)
       Array[i] = i + 1;

   return;
}

void fillFibonacciArray(int SIZE_OF_FIBONACCI_ARRAY)
{
   int i;

   Array[0] = { 0 };
   Array[1] = { 1 };

   for (int i = 2; i < SIZE_OF_FIBONACCI_ARRAY; i++)
       Array[i] = Array[i - 1] + Array[i - 2];

   return;
}

double mean(float Array[], int SIZE_OF_GENERIC_ARRAY)
{
   double sumOfElements = 0;
   int i;

   for (i = 0; i < SIZE_OF_GENERIC_ARRAY; i++)
   {
       sumOfElements += Array[i];
   }
   arithmeticAverage = sumOfElements / i;
   return (arithmeticAverage);
}

double mean(int Array[], int SIZE_OF_FIBONACCI_ARRAY)
{
   double sumOfElements = 0;
   int i;

   for (i = 0; i < SIZE_OF_FIBONACCI_ARRAY; i++)
   {
       sumOfElements += Array[i];
   }
   arithmeticAverage = sumOfElements / i;
   return (arithmeticAverage);
}

double standardDeviation(float Array[], int SIZE_OF_GENERIC_ARRAY)
{
   double standardDeviation;
   double tempSum = 0;

   for (int i = 0; i < SIZE_OF_GENERIC_ARRAY; i++)
   {
       tempSum += pow((Array[i] - mean(Array, SIZE_OF_GENERIC_ARRAY)), 2);
   }
   standardDeviation = sqrt(tempSum / (SIZE_OF_GENERIC_ARRAY));
   return (standardDeviation);
}

double standardDeviation(int Array[], int SIZE_OF_FIBONACCI_ARRAY)
{
   double standardDeviation;
   double tempSum = 0;

   for (int i = 0; i < SIZE_OF_FIBONACCI_ARRAY; i++)
   {
       tempSum += pow((Array[i] - mean(Array, SIZE_OF_FIBONACCI_ARRAY)), 2);
   }
   standardDeviation = sqrt(tempSum / (SIZE_OF_FIBONACCI_ARRAY));
   return (standardDeviation);
}

void outputMean()
{
   cout << "\n";
   cout << "The mean is: " << mean(Array, SIZE_OF_GENERIC_ARRAY);
   cout << endl;
}

void outputStandardDeviation()
{
   cout << "\n";
   cout << "The standard deviation is: " << standardDeviation(Array,
    SIZE_OF_GENERIC_ARRAY);
   cout << endl;
}

According to the author this code snippet was supposed to be producing the expected results. Whenever I see c++ code I always like to be able to compile the code with maximum warnings enabled [2] and see if I get a error and warning free build. Not being able to build a project is a bad state of affairs. Bad enough that entire techniques have emerged with the express aim of reducing the time that code stays in a non-compilable state. [3]

Without feedback mechanisms to test that your code is correct you are just left guessing. A crucial part of any feedback system with c++ means compiling.

Before we go further, from experience I know that improving this code will take quite a few steps so I've decided to use Git to keep track of my changes to this code.

(master)$ git init .
(master)$ git add main.cpp
(master)$ git commit -m "Commit of original code"
[master 39bcb4a] Commit of original code
 1 file changed, 147 insertions(+)
 create mode 100644 02_fibonnaci_standard_deviation_homework/main.cpp

The first problem here is that I'm at home using gcc so this code won't even compile, which nicely seguaes into the first part of this tutorial...

The first thing to do here is to try to build the project, with all warnings enabled. Given that there is only one source file we could achieve this with:

g++ -Wall main.cpp -o Output

Using something like a makefile would be completely inappropriate here. [4] We could if we wanted make a small shell script to run the build command:

build.sh

#!/bin/sh
g++ -Wall main.cpp -o Output

then we just need to change the permissions on the file so we can execute our script and we are good to go:

chmod +x build.sh

First attempt at compilation:

(master)$ ./compile.sh
main.cpp:1:20: fatal error: stdafx.h: No such file or directory
compilation terminated.

I can't even compile because there's a dependency on a windows specific header file. Given that the code doesn't use anything at all that specifically requires windows removing these dependencies is the first step I'm going to take.

So we remove the #include "stdafx.h" and the main gets changed to the standard:

int main(int argc, char* argv[])

You really don't want to use TCHAR in new code, it's windows specific and even microsoft advises against it: http://msdn.microsoft.com/en-us/library/ff381407%28VS.85%29.aspx. [1] Now we can try compiling again:

(master)$ ./compile.sh
main.cpp: In function ‘void fillGenericArray(int)’:
main.cpp:60:19: warning: extended initializer lists only available with -std=c++11 or -std=gnu++11 [enabled by default]
main.cpp:58:8: warning: unused variable ‘i’ [-Wunused-variable]
main.cpp: In function ‘void fillFibonacciArray(int)’:
main.cpp:72:19: warning: extended initializer lists only available with -std=c++11 or -std=gnu++11 [enabled by default]
main.cpp:73:19: warning: extended initializer lists only available with -std=c++11 or -std=gnu++11 [enabled by default]
main.cpp:70:8: warning: unused variable ‘i’ [-Wunused-variable]

So we now have some code that compiles. Lets try running it with some real input:

(master)$ ./Output
Would you like to generate a generic sequence or a Fibonacci sequence?

Type [G] + [ENTER] for a generic sequence, or;
Type [F] + [ENTER] for a Fibonacci sequence: G

The mean is: 50.5

The standard deviation is: 28.8661
(master)$ ./Output
Would you like to generate a generic sequence or a Fibonacci sequence?

Type [G] + [ENTER] for a generic sequence, or;
Type [F] + [ENTER] for a Fibonacci sequence: F

The mean is: 109.45

The standard deviation is: 520.447

If you are a very astute reader you will probably notice a problem here, I'll get back to that later on. We can see that this doesn't crash when we run it which is a good sign. Now that we are at the point where we can successfully compile and run it is a good time to commit changes to the Git repository:

(master)$ git add compile.sh
(master)$ git commit -m "Created a small shell script to help compile with gcc"
[master 6793dcf] Created a small shell script to help compile with gcc
 1 file changed, 2 insertions(+)
 create mode 100755 02_fibonnaci_standard_deviation_homework/compile.sh

(master)$ git add main.cpp
(master)$ git commit -m "Removed windows specific cruft"
[master 70ac562] Removed windows specific cruft
 1 file changed, 1 insertion(+), 2 deletions(-)

The compiler warnings are a concern so let's work through each of those. The first compiler warning indicates to us that we are attempting to use a c++11 language feature without using the flag to explicitly enable c++11. this is slightly concerning because we aren't needing to use any c++11 specific language features in this program. Line 60 is the first that this occurs:

line 60:

Array[0] = { 1 };

This is using the c++11 style brace initialization syntax. We don't really need that here so let's change all these to simple assignments:

Array[0] = 1;

That leaves the warnings about the unused variables. Lets look at the block of code that line 85 is in:

void fillGenericArray(int SIZE_OF_GENERIC_ARRAY)
{
    int i = 0;

    Array[0] = 1;

    for (int i = 0; i < SIZE_OF_GENERIC_ARRAY; i++)
        Array[i] = i + 1;

    return;
}

We can see that there's declarations of i here, int i = 0. That's a bug so lets get rid of the un-needed first declaration:

void fillGenericArray(int SIZE_OF_GENERIC_ARRAY)
{
    Array[0] = 1;

    for (int i = 0; i < SIZE_OF_GENERIC_ARRAY; i++)
        Array[i] = i + 1;

    return;
}

The other warning at line 70 was essentially identical. Fixing these we can compile again:

(master)$ ./compile.sh
(master)$

Now we have no warnings. The code executes without crashing at runtime. Time to make another commit to git.

(master)$ git add main.cpp
(master)$ git commit -m "Fixed up compiler warnings"
[master 1821b18] Fixed up compiler warnings
 1 file changed, 3 insertions(+), 7 deletions(-)

We are still very much in the phase of trying to make preliminary improvements. There's a few things that stand out here.

#include <iostream>
#include <cmath>
using namespace std;

Pulling everything from the standard namespace into scope is always a bad sign. You don't want to do this because it brings a lot of things into the namespace. [5] Any time you write something that clashes with a name in the standard namespace is a potential problem. The whole issue is completely avoided by not doing this. So the next step is to remove this line and put in the std:: where required, such as std::cout.

The main loop now looks like this:

int main(int argc, char* argv[])
{
   std::cout << "Would you like to generate a generic sequence or a Fibonacci sequence?"
       << std::endl
       << "\n"
       << "Type [G] + [ENTER] for a generic sequence, or;" << std::endl
       << "Type [F] + [ENTER] for a Fibonacci sequence: ";
   std::cin >> sequenceType;

   if (sequenceType == 'G' || sequenceType == 'g')
   {
       fillGenericArray(SIZE_OF_GENERIC_ARRAY);
       outputMean();
       outputStandardDeviation();
   }

   else if (sequenceType == 'F' || sequenceType == 'f')
   {
       fillFibonacciArray(SIZE_OF_FIBONACCI_ARRAY);
       outputMean();
       outputStandardDeviation();
   }

   else
       std::cout << "\n"
            << "Invalid input. Please type 'F' or 'G'. Thank you.";

   return 0;
}

This succefully compiles and runs, time to make another git commit.

(master)$ git add main.cpp
(master)$ git commit -m "Removed using namespace std"

Now that we are getting past some of the more mechanical cleanups we can get stuck into some more substantial changes. Perhaps one of the biggest issues with this code is the use of mutable global state. Mutable global state is something that you really want to avoid as much as possible because it makes it harder to reason about what your code does. When variables are in global scope anything could end up changing them and you now have to analyze your entire program whenever you are wanting to figure out what is changing that code. This is the concept of encapsulation, what you really want is to have things be changed from as few places as possible. That way your code stays much more maintainable because it's easier to understand what is going on.

The following 2 global variables are particularly egregious.

double arithmeticAverage;
char sequenceType;

We have a variable for an average that is completely taken taken out of context of the set of items that the average is taken from. Additionally we have a variable to handle the input that's not in the same place where the input handling code is. Once the size of the projects grow these types of variables start making it much harder to keep maintainable code.

So lets get these variables in the right places. The easiest one is the sequenceType variable. We just need to move that into the same place as the input handling code which is in the main loop:

int main(int argc, char* argv[])
{
   char sequenceType;
   std::cout << "Would you like to generate a generic sequence or a Fibonacci sequence?"
       << std::endl
       << "\n"
       << "Type [G] + [ENTER] for a generic sequence, or;" << std::endl
       << "Type [F] + [ENTER] for a Fibonacci sequence: ";
   std::cin >> sequenceType;

   if (sequenceType == 'G' || sequenceType == 'g')
   {
       fillGenericArray(SIZE_OF_GENERIC_ARRAY);
       outputMean();
       outputStandardDeviation();
   }

   else if (sequenceType == 'F' || sequenceType == 'f')
   {
       fillFibonacciArray(SIZE_OF_FIBONACCI_ARRAY);
       outputMean();
       outputStandardDeviation();
   }

   else
       std::cout << "\n"
            << "Invalid input. Please type 'F' or 'G'. Thank you.";

   return 0;
}

Given that we just changed some mutable state it makes sense to search through the code to see where that variable was used. Luckily for us it is only used here.

Time to compile and run the code. No warnings and the code runs.

(master)$ git add main.cpp
(master)$ git commit -m "Took the sequenceType variable out of global scope"
[master a45fcef] Took the sequenceType variable out of global scope
 1 file changed, 1 insertion(+), 1 deletion(-)

Next up is the arithmeticAverage code. This is harder to change because it is used in a few more places. This is really where global mutable state starts to really bite. If this was production code I was refactoring I would most certainly be writing as many characterisation tests and unit tests as possible. However given that I'm writing a blog post and not wanting to put too much time into this I'm just going to compare the outputs found from running the program with the outputs from previous program runs that are included earlier.

Searching through the program I found that it's only used in the following locations:

double arithmeticAverage;

double mean(float Array[], int SIZE_OF_GENERIC_ARRAY)
{
   double sumOfElements = 0;
   int i;

   for (i = 0; i < SIZE_OF_GENERIC_ARRAY; i++)
   {
       sumOfElements += Array[i];
   }
   arithmeticAverage = sumOfElements / i;
   return (arithmeticAverage);
}

double mean(int Array[], int SIZE_OF_FIBONACCI_ARRAY)
{
   double sumOfElements = 0;
   int i;

   for (i = 0; i < SIZE_OF_FIBONACCI_ARRAY; i++)
   {
       sumOfElements += Array[i];
   }
   arithmeticAverage = sumOfElements / i;
   return (arithmeticAverage);
}

Essentially this is a completely bogus global variable because we never read it from anywhere. We can get rid of it entirely by removing the declaration and changing the functions:

double mean(float Array[], int SIZE_OF_GENERIC_ARRAY)
{
   double sumOfElements = 0;
   int i;

   for (i = 0; i < SIZE_OF_GENERIC_ARRAY; i++)
   {
       sumOfElements += Array[i];
   }
   return sumOfElements / i;
}

double mean(int Array[], int SIZE_OF_FIBONACCI_ARRAY)
{
   double sumOfElements = 0;
   int i;

   for (i = 0; i < SIZE_OF_FIBONACCI_ARRAY; i++)
   {
       sumOfElements += Array[i];
   }
   return sumOfElements / i;
}

Compiling produces no errors or warnings. Running produces the same results as before. Time to commit the changes.

(master)$ git add main.cpp
(master)$ git commit -m "Removed arithmeticAverage global variable that was not being used"
[master a816942] Removed arithmeticAverage global variable that was not being used
 1 file changed, 2 insertions(+), 6 deletions(-)

We still have 2 globals left with the arrays. Getting these out of the global scope is the next goal. Next up are some of the naming conventions. I especially dislike the name Array as used in the global variable:

float Array[SIZE_OF_GENERIC_ARRAY];
int Fibonacci[SIZE_OF_FIBONACCI_ARRAY];

The main problem is that it doesn't explain what the variable actually is. The name only tells you about the type of container and you could have found that information anyway by looking through the code (or in your IDE). In the future you might choose a different container to use and then the variable name makes no sense whatsoever. Always try to use names that explain what the variable is referring to.

The naming issues gets worse too because later in the code there is the following function:

double mean(float Array[], int SIZE_OF_GENERIC_ARRAY)
{
   double sumOfElements = 0;
   int i;

   for (i = 0; i < SIZE_OF_GENERIC_ARRAY; i++)
   {
       sumOfElements += Array[i];
   }
   return sumOfElements / i;
}

Now the same name in the global state is being used for the first parameter being passed in to this function. From a readablity point of view this is not a good thing. C++ starts by looking for names in the most local scope and starts working outwards [6], the fact that this function works as desired is lucky (or perhaps unlucky if a bug is hidden by the code appearing to work). Don't leave it up to luck though, change the names!

Let's get the array's out of global scope then once that is done rename them. Because these ate global scope variables we have to search through all the code to find where these arrays are used.

Searching for Fibonnaci shows that it's not actually used anywhere. Dead code like this is always to be avoided so let's take a note of this in case it hints at the intentions of the author at some later point and remove it from the code. [7]

Compiling the code gives no errors or warnings and the output from running the program is the same as before. Time to commit changes:

(master)$ git add main.cpp
(master)$ git commit -m "Removed Fibonnaci global array as it was not being used"
[master c838610] Removed Fibonnaci global array as it was not being used
 1 file changed, 1 deletion(-)

Now we have to deal with Array this will take quite a few changes:

First lets move the declaration into main. This means we must now pass the array as a parameter to the various functions that use it.

main looks like this after the changes:

int main(int argc, char* argv[])
{
   char sequenceType;
   float Array[SIZE_OF_GENERIC_ARRAY];

   std::cout << "Would you like to generate a generic sequence or a Fibonacci sequence?"
       << std::endl
       << "\n"
       << "Type [G] + [ENTER] for a generic sequence, or;" << std::endl
       << "Type [F] + [ENTER] for a Fibonacci sequence: ";
   std::cin >> sequenceType;

   if (sequenceType == 'G' || sequenceType == 'g')
   {
       fillGenericArray(Array, SIZE_OF_GENERIC_ARRAY);
       outputMean();
       outputStandardDeviation();
   }

   else if (sequenceType == 'F' || sequenceType == 'f')
   {
       fillFibonacciArray(Array, SIZE_OF_FIBONACCI_ARRAY);
       outputMean();
       outputStandardDeviation();
   }

   else
       std::cout << "\n"
            << "Invalid input. Please type 'F' or 'G'. Thank you.";

   return 0;
}

This in turn necessitates changing the fillGenericArray and fillFibonacciArray functions.

Lets start with the function declarations:

Previously these were:

void fillGenericArray(int SIZE_OF_GENERIC_ARRAY);
void fillFibonacciArray(int SIZE_OF_FIBONACCI_ARRAY);

The syntax here is wrong for a default size and we are always passing a value so SIZE_OF_GENERIC_ARRAY is not needed here. Also for lengths of arrays we really want to be using size_t and not int. This might seem like a nitpick but it really isn't.

void fillGenericArray(float[], size_t);
void fillFibonacciArray(float[], size_t);

Now on to the functions themselves.

void fillGenericArray(float array_to_fill[], size_t array_size)
{
   for (size_t i = 0; i < array_size; i++){
       array_to_fill[i] = i + 1;
   }
}

void fillFibonacciArray(float array_to_fill[], size_t array_size)
{
   array_to_fill[0] = 0;
   array_to_fill[1] = 1;

   for (size_t i = 2; i < array_size; i++){
       array_to_fill[i] = array_to_fill[i - 1] + array_to_fill[i - 2];
   }
}

After making these changes we can try compiling:

(master)$ ./compile.sh
main.cpp: In function ‘void outputMean()’:
main.cpp:122:41: error: ‘Array’ was not declared in this scope
main.cpp: In function ‘void outputStandardDeviation()’:
main.cpp:129:68: error: ‘Array’ was not declared in this scope

So we see that the global variable was being used in the print functions. We will now need to outputMean and outputStandardDeviation as well.

Lets start with the function declarations:

Previously these were:

void outputMean();
void outputStandardDeviation();

Now we need to pass in the array so these will have to change to:

void outputMean(float[], size_t);
void outputStandardDeviation(float[], size_t);

The functions change to the following:

void outputMean(float array_to_display[], size_t array_size)
{
   std::cout << "\n";
   std::cout << "The mean is: " << mean(array_to_display, array_size);
   std::cout << std::endl;
}

void outputStandardDeviation(float array_to_display[], size_t array_size)
{
   std::cout << "\n";
   std::cout << "The standard deviation is: " << standardDeviation(array_to_display, array_size);
   std::cout << std::endl;
}

Now we need to change the call sites:

if (sequenceType == 'G' || sequenceType == 'g')
{
    fillGenericArray(Array, SIZE_OF_GENERIC_ARRAY);
    outputMean(Array, SIZE_OF_GENERIC_ARRAY);
    outputStandardDeviation(Array, SIZE_OF_GENERIC_ARRAY);
}

else if (sequenceType == 'F' || sequenceType == 'f')
{
    fillFibonacciArray(Array, SIZE_OF_FIBONACCI_ARRAY);
    outputMean(Array, SIZE_OF_FIBONACCI_ARRAY);
    outputStandardDeviation(Array, SIZE_OF_FIBONACCI_ARRAY);
}

Let's compile and run this:

(master)$ ./compile.sh
(master)$ ./Output
Would you like to generate a generic sequence or a Fibonacci sequence?

Type [G] + [ENTER] for a generic sequence, or;
Type [F] + [ENTER] for a Fibonacci sequence: G

The mean is: 50.5

The standard deviation is: 28.8661
(master)$ ./Output
Would you like to generate a generic sequence or a Fibonacci sequence?

Type [G] + [ENTER] for a generic sequence, or;
Type [F] + [ENTER] for a Fibonacci sequence: F

The mean is: 547.25

The standard deviation is: 1055.81

So we get a compile with no warnings or errors, the first result is the same as before but the second result is now different.

I'm going to make a commit here as I'm about to go off on a bit of a tangent and I don't want to forget where I was at with my code. Remember commit often!

(master)$ git add main.cpp
(master)$ git commit -m "Moved the Array from global scope and into main. This required changing the functions that were directly accessing the global array to instead take the array as a parameter"
[master 0466b4c] Moved the Array from global scope and into main. This required changing the functions that were directly accessing the global array to instead take the array as a parameter
 1 file changed, 26 insertions(+), 31 deletions(-)

Intermezzo1: unit testing

We just changed the code and we found that the output changed. Tracking down why this happened is important, but before we do take a minute to reflect that we noticed something changed with our output. Whenever you are dealing with code in production it's especially important to notice any outwardly facing changes. A particularly insidious problem can arise if users are actually depending on your code being in a broken state.

Because we have a project specification it's important for us to be able to create some tests that will verify that our program logic actually works as we intend it to.

In this case when we find a bug like this it's a fantastic opportunity to make a unit test so that we don't have future regressions in the code.

Postmortem: what happened the mean and standard deviation for fibonacci?

Previously when we were calculating the mean and the standard deviation for the fibonnaci filled array we weren't getting the correct result because the wrong function was being called. Let's look at the case for calculating the mean, the standard deviation is similar.

As you will note from before the Fibonnaci array was never used and we removed it. This meant that the sequence of the first 20 fibbonaci numbers were always being stored in Array which was an array of 100 int This meant whenever we called mean c++ would look through the functions with the name mean to find which was the most suitable:

double mean(float[], int);
double mean(int[], int);

Because we only ever used an array of type int the c++ compiler would send that call to the function that took an array of int as a parameter. So it would call the following function:

double mean(int Array[], int SIZE_OF_FIBONACCI_ARRAY)
{
   double sumOfElements = 0;
   int i;

   for (i = 0; i < SIZE_OF_FIBONACCI_ARRAY; i++)
   {
       sumOfElements += Array[i];
   }
   return sumOfElements / i;
}

Now previously this was only called in the outputMean function with mean(Array, SIZE_OF_GENERIC_ARRAY);

So the parameter SIZE_OF_FIBONACCI_ARRAY here was actually containing the value 100 as we passed in SIZE_OF_GENERIC_ARRAY in to the function when we called it. This is a prime example of a bug that was caused by bad naming conventions. On a quick glance at the code you could be excused for thinking that the value of SIZE_OF_FIBONACCI_ARRAY would be equal to the value defined by const int SIZE_OF_FIBONACCI_ARRAY that was defined at the global scope. But in this particular case the values were not the same and hence we got a somewhat hard to find bug. I know that the first time I looked at the code I completely missed this bug, it was only after examining it closely afterwards that the intent was revealed. This whole problem might have been avoided completely by a better choice of variable names as the bug might have stood out more in the first place. At the very least you should have a naming convention that mandates you avoid name collisions in the same namespace.

So now we have located the part of the code responsible for the bug we can ask why we got the specific results we got. gcc by default initializes the arrays that are created above main with 0's. Because we were always using Array which was 100 elements long we ended up with the first 20 fibonacci numbers followed by 80 zeros. The code that calculates the mean should only be taking into account the first 20 elements of that array but courtesy of the bug it was taking all 100 elements. The addition of an extra 80 elements with the value 0 should decrease the mean by a factor of 5.

So if we take a look at the results:

$\frac{old result}{new result} = \frac{109.45}{547.25} = 0.2$

We see that the old results were only 20% as big as they were supposed to be. This matches up with what we would expect from this particular buggy implementation.

Cleaning up the code

Now that we have identified the incorrect function overloading and the problems it caused lets try to fix the issues.

The specification states that we are supposed to be creating 2 arrays that contain sequences of integers. However earlier on we got rid of the Fibonacci array because we weren't using it.

Now that we are actually passing around arrays to the functions it will be a lot easier to put an additional array back in. While I'm at it I'm going to change the name Array to SequentialNumbers.

We need to change main as follows:

int main(int argc, char* argv[])
{
   char sequenceType;
   float SequentialNumbers[SIZE_OF_GENERIC_ARRAY];
   int FibonacciNumbers[SIZE_OF_FIBONACCI_ARRAY];
   std::cout << "Would you like to generate a generic sequence or a Fibonacci sequence?"
       << std::endl
       << "\n"
       << "Type [G] + [ENTER] for a generic sequence, or;" << std::endl
       << "Type [F] + [ENTER] for a Fibonacci sequence: ";
   std::cin >> sequenceType;

   if (sequenceType == 'G' || sequenceType == 'g')
   {
       fillGenericArray(SequentialNumbers, SIZE_OF_GENERIC_ARRAY);
       outputMean(SequentialNumbers, SIZE_OF_GENERIC_ARRAY);
       outputStandardDeviation(SequentialNumbers, SIZE_OF_GENERIC_ARRAY);
   }

   else if (sequenceType == 'F' || sequenceType == 'f')
   {
       fillFibonacciArray(FibonacciNumbers, SIZE_OF_FIBONACCI_ARRAY);
       outputMean(FibonacciNumbers, SIZE_OF_FIBONACCI_ARRAY);
       outputStandardDeviation(FibonacciNumbers, SIZE_OF_FIBONACCI_ARRAY);
   }

   else
       std::cout << "\n"
            << "Invalid input. Please type 'F' or 'G'. Thank you.";

   return 0;
}

and we have to change fillFibonacciArray to take an array of integers:

void fillFibonacciArray(int[], size_t);

void fillFibonacciArray(float array_to_fill[], size_t array_size)
{
   array_to_fill[0] = 0;
   array_to_fill[1] = 1;

   for (size_t i = 2; i < array_size; i++){
       array_to_fill[i] = array_to_fill[i - 1] + array_to_fill[i - 2];
   }
}

And we need to create some extra display functions that take an array of integers:

void outputMean(int[], size_t);
void outputStandardDeviation(int[], size_t);

void outputMean(int array_to_display[], size_t array_size)
{
   std::cout << "\n";
   std::cout << "The mean is: " << mean(array_to_display, array_size);
   std::cout << std::endl;
}

void outputStandardDeviation(int array_to_display[], size_t array_size)
{
   std::cout << "\n";
   std::cout << "The standard deviation is: " << standardDeviation(array_to_display, array_size);
   std::cout << std::endl;
}

Now at this point we can compile the code and run it:

(master)$ ./compile.sh
(master)$ ./Output
Would you like to generate a generic sequence or a Fibonacci sequence?

Type [G] + [ENTER] for a generic sequence, or;
Type [F] + [ENTER] for a Fibonacci sequence: G

The mean is: 50.5

The standard deviation is: 28.8661
(master)$ ./Output
Would you like to generate a generic sequence or a Fibonacci sequence?

Type [G] + [ENTER] for a generic sequence, or;
Type [F] + [ENTER] for a Fibonacci sequence: F

The mean is: 547.25

The standard deviation is: 1055.81

At this point a git commit is certainly in order.

(master)$ git add main.cpp
(master)$ git commit -m "Added in an array of ints containing fibonacci numbers to meet the project spec. Created additional functions to handle the output for int type arrays"
[master 6401927] Added in an array of ints containing fibonacci numbers to meet the project spec. Created additional functions to handle the output for int type arrays
 1 file changed, 29 insertions(+), 10 deletions(-)

There were a few other things I mentioned earlier that I think warrant fixing. For example using size_t for array indices, improving names, removing some redundant variables and a few other minor cleanups. This is what the code looks like after doing those fixes:

#include <iostream>
#include <cmath>

const size_t SIZE_OF_GENERIC_ARRAY = 100;
const size_t SIZE_OF_FIBONACCI_ARRAY = 20;

void fillSequentialIntegersArray(float[], size_t);
void fillFibonacciArray(int[], size_t);

double mean(float[], size_t);
double mean(int[], size_t);

double standardDeviation(float[], size_t);
double standardDeviation(int[], size_t);

void outputMean(float[], size_t);
void outputMean(int[], size_t);

void outputStandardDeviation(float[], size_t);
void outputStandardDeviation(int[], size_t);

int main(int argc, char* argv[])
{
   char sequenceType;
   float SequentialNumbers[SIZE_OF_GENERIC_ARRAY];
   int FibonacciNumbers[SIZE_OF_FIBONACCI_ARRAY];
   std::cout << "Would you like to generate a generic sequence or a Fibonacci sequence?"
       << std::endl
       << "\n"
       << "Type [G] + [ENTER] for a generic sequence, or;" << std::endl
       << "Type [F] + [ENTER] for a Fibonacci sequence: ";
   std::cin >> sequenceType;

   if (sequenceType == 'G' || sequenceType == 'g')
   {
       fillSequentialIntegersArray(SequentialNumbers, SIZE_OF_GENERIC_ARRAY);
       outputMean(SequentialNumbers, SIZE_OF_GENERIC_ARRAY);
       outputStandardDeviation(SequentialNumbers, SIZE_OF_GENERIC_ARRAY);
   }
   else if (sequenceType == 'F' || sequenceType == 'f')
   {
       fillFibonacciArray(FibonacciNumbers, SIZE_OF_FIBONACCI_ARRAY);
       outputMean(FibonacciNumbers, SIZE_OF_FIBONACCI_ARRAY);
       outputStandardDeviation(FibonacciNumbers, SIZE_OF_FIBONACCI_ARRAY);
   }
   else
   {
       std::cout << "\n"
            << "Invalid input. Please type 'F' or 'G'. Thank you.";
   }
   return 0;
}

void fillSequentialIntegersArray(float array_to_fill[], size_t array_size)
{
   for (size_t i = 0; i < array_size; i++){
       array_to_fill[i] = i + 1;
   }
}

void fillFibonacciArray(int array_to_fill[], size_t array_size)
{
   array_to_fill[0] = 0;
   array_to_fill[1] = 1;

   for (size_t i = 2; i < array_size; i++){
       array_to_fill[i] = array_to_fill[i - 1] + array_to_fill[i - 2];
   }
}

double mean(float Array[], size_t array_size)
{
   double sumOfElements = 0;

   for (size_t i = 0; i < array_size; i++)
   {
       sumOfElements += Array[i];
   }
   return sumOfElements / array_size;
}

double mean(int Array[], size_t array_size)
{
   double sumOfElements = 0;

   for (size_t i = 0; i < array_size; i++)
   {
       sumOfElements += Array[i];
   }
   return sumOfElements / array_size;
}

double standardDeviation(float Array[], size_t array_size)
{
   double tempSum = 0;

   for (size_t i = 0; i < array_size; i++)
   {
       tempSum += pow((Array[i] - mean(Array, array_size)), 2);
   }
   return sqrt(tempSum / array_size);
}

double standardDeviation(int Array[], size_t array_size)
{
   double tempSum = 0;

   for (size_t i = 0; i < array_size; i++)
   {
       tempSum += pow((Array[i] - mean(Array, array_size)), 2);
   }
   return sqrt(tempSum / array_size);
}

void outputMean(float array_to_display[], size_t array_size)
{
   std::cout << "\n";
   std::cout << "The mean is: " << mean(array_to_display, array_size);
   std::cout << std::endl;
}

void outputMean(int array_to_display[], size_t array_size)
{
   std::cout << "\n";
   std::cout << "The mean is: " << mean(array_to_display, array_size);
   std::cout << std::endl;
}

void outputStandardDeviation(float array_to_display[], size_t array_size)
{
   std::cout << "\n";
   std::cout << "The standard deviation is: " << standardDeviation(array_to_display, array_size);
   std::cout << std::endl;
}

void outputStandardDeviation(int array_to_display[], size_t array_size)
{
   std::cout << "\n";
   std::cout << "The standard deviation is: " << standardDeviation(array_to_display, array_size);
   std::cout << std::endl;
}

Code successfully compiles and runs.

Time to commit to the git repository:

(master)$ git add main.cpp
(master)$ git commit -m "Cleaned up many aspects of the code. Brought the code to a state where the project specification is satisfied."
[master e5f2f97] Cleaned up many aspects of the code. Brought the code to a state where the project specification is satisfied.
 1 file changed, 31 insertions(+), 40 deletions(-)

wrapping this up

At this point I think for the purposes of fixing up the assignment we would be close to done. If I was running a class I think it would be unreasonable to expect much more than this from a complete c++ beginner.

However there's still a substantial gap left between this code and what I would consider to be top notch c++ code. See part 2 of this article for how we can improve this code even further.

[1]Unfortunately I see a number of c++ courses that appear to be teaching people some really crappy things like using TCHAR. I think that introducing these platform specific depreciated things should be avoided, although there is a chance there is some justification for this I'm not aware of. I have a lot to say about teaching c++ and I'll make a post about it when I get some time.
[2]Given the nature of c++ it is best to take compiler warnings very seriously. If I encounter any warnings I immediately try to find out why. This is especially so with code from less experienced c++ programmers as the warnings are often very good indicators of bugs.
[3]Techniques such as continuous integration have emerged because of the immense problems that arise when you continue to work on non-compiling or non functioning code for any extended length of time. A project I was working on got crippled because of a problems with integrating a large number of changes. Not being able to build the project for a month quite literally derailed that entire project.
[4]It's entirely possible that it's always a bad choice to use a makefile for c++ projects, better build tools exist these days. I'm not quite willing to say that using makefiles is a pareto suboptimal choice but I wouldn't be at all surprised if that was indeed the case.
[5]Unfortunately #include <cmath> still polutes the global namesapce in many c++ implementations. Many implementations still bring these functions into the global namespace along with the std namespace. So many implementations were non conforming that the standard committee unfortunately ended up allowing the #include <c___> libraries to put things in the global namespace. See the defect report for more information: http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#456
[6]

c++ name resolution rules are a bit complex. The lookup is done in this order:

  1. If the name is preceeded by :: then qualified name lookup is used. This means that the location that is specified before the :: is searched first. For example if we had abc::xyz then abc and only abc is searched for xyz.
  2. If the name is used in a function call then argument dependent lookup is used. This is complicated... For example you might have something like foo(bar). The namespace bar was declared in is used to search for foo.
  3. If non of the above apply then unqualified lookup is used. The compiler starts at the most local scope and works outwards until the name is found.

See http://en.cppreference.com/w/cpp/language/lookup for more information. If you need more precise information that this consult the c++ standard.

[7]Generally speaking the sooner you find bugs the easier they are to fix. One of the most insidious things about dead code paths is that you can get problems that pop up much later on when things change and the previously dead code ends up getting called. These bugs can be extremely damaging or expensive to fix in some projects and as a result many coding standards dictate that dead code must always be removed as soon as it is found.

blogroll

social