Author Topic: [Tutorial] Sorting IP Addresses, Listing Duplicates with Collection Techniques  (Read 1313 times)

0 Members and 1 Guest are viewing this topic.

Offline Psycho_Coder

  • Knight
  • **
  • Posts: 166
  • Cookies: 84
  • Programmer, Forensic Analyst
    • View Profile
    • Code Hackers Blog
I was lurking around the MooseSaloon and found a thread which says that you have a list of IP addresses and it is required to sort them and then count the duplicates if any. Its a simple problem and is interesting because similar type of question I have found at several places where people try to use sort method of Collections or Arrays class to sort, but fails. Here's an explanation why :-

http://java67.blogspot.in/2012/10/how-to-sort-object-in-java-comparator-comparable-example.html

Quoting from the site :

Quote
...... sorting objects. compareTo() method is used to provide Object's natural order sorting and compare() method is used to sort Object with any arbitrary field. Almost all value classes in Java library e.g. String, Integer, Double, BigDecimal implement compareTo() to specify there natural sorting order. While overriding compareTo method String is sorted lexicographically and Integers are sorted numerically. .......

Note :- I assume that the reader has some knowledge of Java and knows how to use Collections if not well versed with the concepts. This thread focuses primarily with utilities of the Collection API

Okay, so lets get started then. So we are gonna sort some IP addresses with a Custom Comparator.

First we will create an  ArrayList and add some IPs.

Code: (java) [Select]
List<String> list = new ArrayList<>();

        list.add("123.4.245.23");
        list.add("123.4.245.23");
        list.add("32.183.93.40");
        list.add("104.244.4.1");
        list.add("104.30.244.2");
        list.add("1.198.3.93");
        list.add("32.183.93.40");
        list.add("104.244.4.1");
        list.add("104.30.244.2");
        list.add("104.244.4.1");
        list.add("104.30.244.2");

Now we implement the Comparator. I have implemented the comparator as Anonymous Inner class but you can create a new class and implement the Comparator inteface and pass the instance of the class to Collections.sort() method as for custom sorting.

Java 7+ Style

Code: (java) [Select]
Collections.sort(list, new Comparator<String>() {

            @Override
            public int compare(String t, String t1) {
                return t.hashCode() - t1.hashCode();               
            }
        });

hashCode() is a method with every class in Java provides explicitly or implicitly, which digests the data stored in an instance of the class into a single hash value (a 32-bit signed integer). hashCode() for a particular object is same and so we can use it for  the comparison.

There are others ways to do the same thing, for example you can convert the IP to Long and then perform the Comparison and later convert the Long to String again.

The above code is for people who use Java 7 but if you are using Java 8 then instead of using the new List.sort() should be preferred over Collections.sort().

Java 8+ Style

Code: (java) [Select]
list.sort((Object t, Object t1) -> {
            return t.toString().hashCode() - t1.toString().hashCode();
        });



Why Should I prefer the List.sort() over Collections.sort() in Java 8 ?

Well simple read the following article and the implementation details I gave within the code tags below.

http://ankitsambyal.blogspot.in/2014/03/difference-between-listsort-and.html

Code: [Select]
Implementation Requirements:

The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)

Implementation Note:

This implementation is a stable, adaptive, iterative mergesort that requires far fewer than n lg(n) comparisons when the input array is partially sorted, while offering the performance of a traditional mergesort when the input array is randomly ordered. If the input array is nearly sorted, the implementation requires approximately n comparisons. Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays.
The implementation takes equal advantage of ascending and descending order in its input array, and can take advantage of ascending and descending order in different parts of the same input array. It is well-suited to merging two or more sorted arrays: simply concatenate the arrays and sort the resulting array.
The implementation was adapted from Tim Peters's list sort for Python ( TimSort). It uses techniques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993.



Lets do the final task of printing the elements. I expect most people to know how to use Iterators or for each loop (Java 5 style) so I will be lazy and use the Java 8 style.

Code: [Select]
list.stream().forEach(System.out::println);

Output

Code: [Select]
1.198.3.93
32.183.93.40
32.183.93.40
104.30.244.2
104.30.244.2
104.30.244.2
104.244.4.1
104.244.4.1
104.244.4.1
123.4.245.23
123.4.245.23

Okay we are done. But before ending this thread I would like to discuss about counting the duplicates and store them in a Map.

We will store the IPs and their repetition counts as Key-value pair in Map<K, V>

The following snippet does the thing I mentioned above :-

(You can use Collections.frequecy() to count the duplicates too)

Code: (java) [Select]
/**
         * Create a Map to Store the IP and the Counts
         */
        Map<String, Integer> map = new HashMap<>();
       
        /**
         * Count the Duplicates and store them in Map
         */
        list.stream().forEach((temp) -> {
            Integer count = map.get(temp);
            map.put(temp, (count == null) ? 1 : count + 1);
        });
       
        /**
         * Print the Map
         */
        map.entrySet().stream().forEach((entry) -> {
            System.out.println(entry.getKey() + " - " + entry.getValue());
        });

Output:-

Code: [Select]
104.30.244.2 - 3
1.198.3.93 - 1
32.183.93.40 - 2
104.244.4.1 - 3
123.4.245.23 - 2

Simple, Isn't it ? Hell yeah! Collections + Java 8 = Full Awesomeness :D hahaha

But note that in the above code the IPs are unordered, thats because HashMap is implemented as a hash table, and there is no ordering on keys or values. If you want to maintain the ordering of the IPs then use TreeMap.

Here's a snippet which uses TreeMap. Basically creates a new TreeMap and puts the map we created earlier into the new created TreeMap which then sorts the map by its keys. But don't forget that natural sorting won't be effective so we will have to use our custom Comparator we created earlier.

Here's the snippet :-

Code: (java) [Select]
        /**
         * Create a Map to Store the IP and the Counts
         */
        Map<String, Integer> map = new HashMap<>();
       
        /**
         * Count the Duplicates and store them in Map
         */
        list.stream().forEach((temp) -> {
            Integer count = map.get(temp);
            map.put(temp, (count == null) ? 1 : count + 1);
        });
       
        Map<String, Integer> treeMap = new TreeMap<>((Object t, Object t1) -> {
            return t.toString().hashCode() - t1.toString().hashCode();
        });

        treeMap.putAll(map);
       
        /**
         * Print the Map
         */
        treeMap.entrySet().stream().forEach((entry) -> {
            System.out.println(entry.getKey() + " - " + entry.getValue());
        });

Output

Code: [Select]
1.198.3.93 - 1
32.183.93.40 - 2
104.30.244.2 - 3
104.244.4.1 - 3
123.4.245.23 - 2

As you can observe the sorting order is maintained.

I hope you like this simple tutorial which demonstrates some little Collections techniques, and concepts. Most of the things I have discussed can be done in other ways as well. Like for duplicates you can write your own method, I used hashCode() as a measure of Comparing the IPs but you can use other ways too. This tutorial will also provide you a basic techniques of Java 8 streams.


Now comes Homework :D

Write a Snippet to store the duplicates in a Map<K, V> where the map is ordered by duplicate count with corresponding IPs as Values.

Note that there may be ips with same duplicate counts and in such cases your code shouldn't exclude them. You are free to use any Generics types as Value but Key should be Integer that is :-

Map <Integer, AnyGenericTypeYouLike>



I hope some people will post solutions since its very easy and also that people will find this thread useful in honing their Java Skills. Feedback is most welcomed.


Thank you,
Sincerely,
Psycho_Coder.
"Don't do anything by half. If you love someone, love them with all your soul. When you hate someone, hate them until it hurts."--- Henry Rollins