Hello guys,
I am now in my last phase of the bachelor thesis which includes performance tests for report generation. I already knew how to do this in general, but some pitfalls were new to me. This is what I got after research. The paper will give you some advice how to do performance measurement for your algorithms.
1. Remove console outputs, turn off logging
Once you are sure your code works correcty you should turn off everything that eats time, but doesn't belong to your algorithm. In most cases this will be prints to the console. You don't believe this takes so long? I give you an example. This simple Java program generates a given number of primes and puts them into an array. There is one print to the console for testing purposes. Run this program once with the System.out.println and once without.
public class PrimeGenerator {
public static void main(String[] args) {
long start = System.nanoTime();
new PrimeGenerator().generatePrimesArray(100000);
long end = System.nanoTime();
System.out.println("time in milliseconds: " + (end - start) / 1000000);
}
private int[] generatePrimesArray(int number) {
int[] primes = new int[number];
for (int primesFound = 0, i = 0; primesFound < number; i++) {
if (isPrime(i)) {
System.out.println((primesFound + 1) + ": " + i); // delete this line for the second try
primes[primesFound] = i;
primesFound++;
}
}
return primes;
}
private boolean isPrime(int n) {
if (n < 2) {
return false;
}
if (n == 2) {
return true;
}
for (int i = 2; i < Math.sqrt(n); i++) {
if (n % i == 0) {
return false;
}
}
return true;
}
}
My results:
With System.out.println: 1071 ms
Without System.out.println: 425 ms
The advantage of logging over simple console prints is that you can determine the importance of your outputs. This is done via a loglevel. In case you use logging, you may turn the loglevel to ERROR or SEVERE. It is still nice to know, if something went wrong (your measuring will not work in that case anyways). If your code runs without errors there will be no output to log files or the console which is the goal. If you want outputs that don't affect the measurement, because they are right after or right before the measuring, just use the error stream.
2. Be careful with every test that takes only a few milliseconds
Most of the time-measuring functions use the system clock. Depending on your OS you will only get an accuracy of 10 to 15 ms. Everything that lasts a few milliseconds can't be measured that way.
Here are some options to solve the problem:
1. Make your task bigger:
I.e. if you measure the time for reading a wordlist, you should make the list much bigger. If you have a prime generator, generate more primes.
2. Use a timer that has a better resolution. Some languages provide this. I.e. for Java you should use System.nanoTime() instead of System.currentTimeMillis(). Beware that nanoTime() gives you a result in nanoseconds, but the accuracy won't be this good. It depends on your system and OS.
3. There are also ways to change the update time of your OS, so it will be less than 10 or 15 ms. You may get an accuracy of 1 ms by doing this. But take into account that the call to get the time also takes time itself.
4. Repeat your task.
You should measure this way (pseudocode):
start = getTime();
for(i = 0; i < TIMES; i++){
myTask();
}
end = getTime();
result = (end - start) / TIMES;
This also avoids spending too much time on the getTime() calls.
3. Measure more than once, divide measurement into several sessions
Don't measure your algorithm only once. You should measure several times and take the average of that as result. In addition you should repeat the measurement some time later. Especially when your algorithm doesn't take very long. Every process running in the background may have an effect on your results. If you do several sessions of testing you may see if something really went wrong the last try (have you forgot to shut down some programs?).
4. Always use the same machine and OS to compare results
That should be self-explanatory.
5. Shut down every program that isn't needed
That should be self-explanatory.
6. Delete temporary files for every turn
If your program produces temporary files that it uses further for its tasks, you should of course delete them in every turn. Example: The reporting module I work on creates a temporary report document. This document already contains the data that is needed and is only there to produce several file formats (PDF, DOC, HTML, ...) out of it. If I run the same task several times, the report file creation will rather take this temporary document than produce a new one, which means I can't take into account how long it needs to get the data out of the database. Therefore I need to clear the temporary file directory for each turn.
You may not run into this problem that often, if you use your own programs, because you know how they work. But if you use any frameworks or libraries you might forget that they can produce temporary files.
7. Don't measure the first executions
Your very first execution of your algorithm will always take longer then the executions afterwards. The main reason is your cache that will be filled with everything your program needs. Thus the algorithm will get some speed after the first run. If you know how. you may empty the cache every turn. The other option is that you just do some executions before you measure the time. You can test how many executions you need to do before by looking at the first results.
8. Don't measure the time your environment needs to start up
Java example: The JVM needs some time to start up. So if you use an external program for measurement, make sure you don't measure this part.
Deque