Friday, November 14, 2014

Pallindromes in C++

Sometimes you come across puzzles or interview questions along the lines of "Find the longest palindrome in a string." To be honest, I've never looked for palindromes in my work, but I think it's a good exercise nonetheless. For one thing, the "best practices" strategies can be applied to other problems. For another, the search for symmetric subsets in datasets of all types definitely does come up.

So, we might start this exercise by first writing a function to classify a string as a palindrome or not. A nice way to do this is to start with a pointer at either end of the string, and iterate them inwards, checking the lower and upper characters against each other at each stage.  For example, if the first and last characters are equal, we iterate inwards. If not, we can return "false" immmediately. The following snippet shows this function.


 bool is_pallindrome(string input){  
      int lower=0;  
      int upper=input.length()-1;  
      while (upper>lower){  
           if (input[upper]!=input[lower]) //If we find differing characters, not a pallindrome  
                return false;  
           upper--;  
           lower++;  
      }  
      return true;  
 }  

So now, to find the longest palindrome in a string, we could just iterate over the string, looking at all substrings, and calling is_pallindrome() [notice my typo in the function name]. That sounds pretty straight-forward, but searching all substrings becomes O(N^2), kind of cumbersome.

A nice way to use the is_pallindrome() function is to think about strings that are centered at index i. So, we have a pointer that searches left from i, and a pointer that searches right from i, and we stop iterating when input[lower]!=input[upper]. This is very similar to the is_pallindrome() function.

There's another approach that's more efficient than the above. Sometimes a good way to think about such a problem is to find the desired subset that ends at index i. If I'm searching for a palindrome that ends at index i, I can store the longest palindrome that ended at index i-1. Then I simply have to look at the first index of that palindrome, and the character that precedes it in the input string.

The storage/caching can be implemented with a hashmap, so we store the longest palindrome that ends at each index, and then we can look it up at the next step. This means, each additional step only features two more comparisons. The following snippet implements this approach:

 string longest_pallindrome_map(string input){  
      map<int, string> pal; //hashmap from index to longest pallindrome that *ends* at index  
      int max_length=0; int max_index=0;  
      for (int i=0; i<input.size(); i++){  
           int prev_index=i-pal[i-1].length()-1;            
           if (pal[i-1][0]==input[i]) //Looking for consecutive duplicate letters  
                pal[i]=pal[i-1]+input[i];            
           else if (input[i]==input[prev_index])  
                pal[i]=input[prev_index]+pal[i-1]+input[i]; //Looking at the character prior to previous pallindrome  
           else  
                pal[i]=string(input.begin()+i, input.begin()+i+1); //Longest pallindrome is current character  
   
           if (pal[i].length()>max_length){  
                max_index=i;  
                max_length=pal[i].length();            
           }            
      }  
      return pal[max_index];  
 }  

This is actually quite a bit more efficient in the searching, more like pseudo-linear instead of quadratic search. [Again notice my mispelling of palindrome.]  The only thing is, we are storing a bunch of strings in this hashmap. We don't actually need to do this.

Instead, we can just store the indices or pointers that begin the palindromes that ends at each index in the input string. The following function does this.

 string longest_pallindrome(string input){       
      map<int, int> pal; //hashmap from current index, to the begin-index of the longest pallindrome that *ends* here  
      int max_length=0;  
      int max_ind=0;  
      for (int i=0; i<input.size(); i++){                           
           int prev_index=pal[i-1];                                
           if (input[i]==input[prev_index]){ //Looking for consecutive duplicate characters  
                pal[i]=prev_index;  
           }  
           else if (input[i]==input[prev_index-1]) //Looking at the index prior to the last pallindrome  
                pal[i]=prev_index-1;  
           else  
                pal[i]=i; //Longest pallindrome is simply the current character  
           if ((i-pal[i])>max_length){ //Finding the longest one along the way  
                max_ind=i;  
                max_length=i-pal[i];  
           }                           
      }            
      return string(input.begin()+pal[max_ind], input.begin()+max_ind+1);  
 }  

So now we have, via dynamic programming, a pseudo-linear search, in that we basically have a linear search over the input string, but we have to store O(N) indices.

Did I really need the hashmap? Not necessarily; I could have used a regular old array, and had some code to check_bounds. On the other hand, the hashmap approach is generalizable to other problems, where the key values might not be simple ordinal numbers.

No comments:

Post a Comment