Author Topic: Performance Measurement  (Read 5210 times)

0 Members and 1 Guest are viewing this topic.

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Performance Measurement
« on: September 23, 2011, 07:06:21 pm »
Hello guys,

I am now in my last phase of the bachelor thesis which includes performance tests for report generation. I already knew how to do this in general, but some pitfalls were new to me. This is what I got after research. The paper will give you some advice how to do performance measurement for your algorithms.

1. Remove console outputs, turn off logging

Once you are sure your code works correcty you should turn off everything that eats time, but doesn't belong to your algorithm. In most cases this will be prints to the console. You don't believe this takes so long? I give you an example. This simple Java program generates a given number of primes and puts them into an array. There is one print to the console for testing purposes. Run this program once with the System.out.println and once without.

Code: [Select]
public class PrimeGenerator {

    public static void main(String[] args) {
        long start = System.nanoTime();
        new PrimeGenerator().generatePrimesArray(100000);
        long end = System.nanoTime();
        System.out.println("time in milliseconds: " + (end - start) / 1000000);
    }

    private int[] generatePrimesArray(int number) {
        int[] primes = new int[number];
        for (int primesFound = 0, i = 0; primesFound < number; i++) {
            if (isPrime(i)) {
                System.out.println((primesFound + 1) + ": " + i); // delete this line for the second try
                primes[primesFound] = i;
                primesFound++;
            }
        }
        return primes;
    }

    private boolean isPrime(int n) {
        if (n < 2) {
            return false;
        }
        if (n == 2) {
            return true;
        }
        for (int i = 2; i < Math.sqrt(n); i++) {
            if (n % i == 0) {
                return false;
            }
        }
        return true;
    }

}

My results:
With System.out.println: 1071 ms
Without System.out.println: 425 ms

The advantage of logging over simple console prints is that you can determine the importance of your outputs. This is done via a loglevel. In case you use logging, you may turn the loglevel to ERROR or SEVERE. It is still nice to know, if something went wrong (your measuring will not work in that case anyways). If your code runs without errors there will be no output to log files or the console which is the goal. If you want outputs that don't affect the measurement, because they are right after or right before the measuring, just use the error stream.

2. Be careful with every test that takes only a few milliseconds

Most of the time-measuring functions use the system clock. Depending on your OS you will only get an accuracy of 10 to 15 ms. Everything that lasts a few milliseconds can't be measured that way.

Here are some options to solve the problem:

1. Make your task bigger:
I.e. if you measure the time for reading a wordlist, you should make the list much bigger. If you have a prime generator, generate more primes.

2. Use a timer that has a better resolution. Some languages provide this. I.e. for Java you should use System.nanoTime() instead of System.currentTimeMillis(). Beware that nanoTime() gives you a result in nanoseconds, but the accuracy won't be this good. It depends on your system and OS.

3. There are also ways to change the update time of your OS, so it will be less than 10 or 15 ms. You may get an accuracy of 1 ms by doing this. But take into account that the call to get the time also takes time itself.

4. Repeat your task.
You should measure this way (pseudocode):

Code: [Select]
start = getTime();
for(i = 0; i < TIMES; i++){
   myTask();
}
end = getTime();
result = (end - start) / TIMES;

This also avoids spending too much time on the getTime() calls.

3. Measure more than once, divide measurement into several sessions

Don't measure your algorithm only once. You should measure several times and take the average of that as result. In addition you should repeat the measurement some time later. Especially when your algorithm doesn't take very long. Every process running in the background may have an effect on your results. If you do several sessions of testing you may see if something really went wrong the last try (have you forgot to shut down some programs?).

4. Always use the same machine and OS to compare results

That should be self-explanatory.

5. Shut down every program that isn't needed

That should be self-explanatory.

6. Delete temporary files for every turn

If your program produces temporary files that it uses further for its tasks, you should of course delete them in every turn. Example: The reporting module I work on creates a temporary report document. This document already contains the data that is needed and is only there to produce several file formats (PDF, DOC, HTML, ...) out of it. If I run the same task several times, the report file creation will rather take this temporary document than produce a new one, which means I can't take into account how long it needs to get the data out of the database. Therefore I need to clear the temporary file directory for each turn.

You may not run into this problem that often, if you use your own programs, because you know how they work. But if you use any frameworks or libraries you might forget that they can produce temporary files.

7. Don't measure the first executions

Your very first execution of your algorithm will always take longer then the executions afterwards. The main reason is your cache that will be filled with everything your program needs. Thus the algorithm will get some speed after the first run. If you know how. you may empty the cache every turn. The other option is that you just do some executions before you measure the time. You can test how many executions you need to do before by looking at the first results.

8. Don't measure the time your environment needs to start up

Java example: The JVM needs some time to start up. So if you use an external program for measurement, make sure you don't measure this part.

Deque
« Last Edit: September 23, 2011, 07:07:03 pm by Deque »

Offline gh0st

  • Sir
  • ***
  • Posts: 575
  • Cookies: 8
  • #DEDSec
    • View Profile
Re: Performance Measurement
« Reply #1 on: September 23, 2011, 11:48:56 pm »
again my friend Deque here coming with his awesome skills whats this about? calculating the time that take a program to show an output ? you are showing that one kind of coding is faster than other?

xor

  • Guest
Re: Performance Measurement
« Reply #2 on: September 24, 2011, 07:07:44 am »
Measuring performance is a key aspect of improving a program gh0st. You can see from Deque's example that adding in two println's, even though one is in a loop, will add a significant amount of execution time to your program.

Imagine for example that you were writing your code for an android, or an iPhone. While the processors are getting quicker in these devices that some developers are careless about their code and don't care about performance, you could write the ultimate in responsive / snappy applications just by doing some simple performance tuning on your application.

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: Performance Measurement
« Reply #3 on: September 24, 2011, 11:36:41 am »
whats this about? calculating the time that take a program to show an output ? you are showing that one kind of coding is faster than other?
xor explained it very well. In most cases you do it to find out, if a speed improvement of your program is necessary and where (which parts of your program eat the most time?). Afterwards you need to test, if your actions really helped or made the performance even worse.

In case of my bachelor thesis I also did the measuring to provide some results of my work. Customers can be informed about the times each task takes and can determine how much faster they are with the program than without and if it is worth to buy the program. So this is also kind of documentation.

One example: You can test if one algorithm is faster than another for the amount of your inputs. You might have heard of time complexity, especially for standard algorithms like sorting. Wikipedia provides a big table for that: http://en.wikipedia.org/wiki/Sorting_algorithm
When people say that a quick sort is faster than a bubble sort, they usually talk about the time complexity. But that doesn't mean that one algorithm is faster in all cases. It always depends on the number of your inputs. Algorithms which are considered to be slow are sometimes pretty fast with a small number of inputs. So if you know your program has to sort very often and does this always on small arrays, you will consider choosing an algorithm that is good with small arrays instead of the one that has the best time complexity. Measuring the time is the best way to prove which one to take.
« Last Edit: September 24, 2011, 11:50:51 am by Deque »