R@M3$H.NBlog

Determine and display all anagrams in a string array

20 January, 2014 - 2 min read

Given an array of strings (string[] inputStrings). Devise a way and write the code to determine and display all the anagrams in the array.

Strategy:

With any interview question one should get the specifics of the problem and the input data before designing the solution. In this particular case, we should ask about the approximate number of strings in the array, average length of each string, memory constraints, CPU usage constraints, etc. Each factor can influence the design and selection of algorithm and the data structures used.
The solution I am presenting here optimizes for speed at the cost of extra memory. The basic idea is to come up with simple hash or signature for each string in the input string array. This signature is nothing but the characters in the string sorted alphabetically. So, the signature for "cat" will be "act" and for "tac" it will be "act" too. As you can see the signature is the same for 2 strings that are anagrams. We can use this property to identify and collect the anagrams in the inputString array together. We can use a hash table for fast insert and retrieve these anagrams, but will cost us extra memory.
Note: There are other solutions that don't use a hashtable, but involves CPU cost is higher and involves sorting of the signature. I will post this alternative solution soon. If you have a better method, please write a comment.
Pseudo-Code:
Loop for each string (S) in the array:

  1. Get signature (K) for S (sort the characters of S to get its signature (K))
  2. Insert S in hash table with signature K as the key and append S to the array list contained in the value of the hash table key.
  3. If K already exists, don't add a new key (since it will cause collision), but just append S to the end of the list in that hash table slot.
  4. Goto step 1 with next available string.
  5. In the end just print all the keys in the hash table along with the list of strings in each slot.