Sayanan Sivaraman, PhD

Saturday, May 23, 2015

El Nino Finally Arrives

El Nino climate events are associated with low atmospheric pressure, resulting in cooler, wetter weather in California. Last autumn, with climate scientists predicting an upcoming El Nino event, there was hope that it would coincide with California's winter, which is the rainy season, leading to healthy rainfall and mitigating the 4-year-old drought.

Unfortunately, El Nino never arrived during California's rainy season. It has very recently arrived, as we can verify at NOAA's website here. This El Nino event is still associated with cooler, wetter weather, relative to a typical California spring and summer. However, at this time of year, there aren't many major wet storms in the Northern Pacific to route to California. As such, we'll have a chilly, drizzly spring and summer, but no impact in the way of drought relief.

I lived in San Diego during one of the recent El Nino summers. It never really got warm. It was a chilly, unsatisfying summer. Perhaps this summer's weather will be similar.

Sunday, May 3, 2015

The Housing Crunch Continues

In the SF Bay Area, the housing crunch, complete with spikes in rent and home prices, continues. I've maintained the whole time, that the housing prices are not even justified for high-paid tech workers. Most people who work as engineers in this area cannot afford to purchase a house, and even finding reasonable rent is difficult.

The local news featured this article recently. It's anecdotal, but the premise is that even google software engineers have a difficult time finding housing in Mountain View.

Monday, February 9, 2015

Enjoying Judo

At this point, I've been training judo for about a month, and I'm really enjoying it. It's pretty amazing how the art exploits balance and leverage. The most elegant judo throws and sweeps use the opponent's weight and balance against him. It's an art that accommodates different body types, heights, weights, etc. In fact, being short with a low center of mass can be quite an asset; shorter opponents are often harder to throw, but have an easier time throwing their taller opponents.

I enjoy the variety that judo presents as well. There are throws, sweeps, takedowns, and ground-based grappling. I've been attending trainings at Cahill Judo academy in San Bruno. The chief instructor there, Willy Cahill, has coached judo for decades, including work with the US Olympic team.

Monday, January 26, 2015

Find the first substring in C++

Years ago, I interviewed for a summer internship, and I got a coding question along the lines of: "Implement a function that finds a substring within a string. If that substring is found, return the index of the found substring."

I fumbled through the question. Eventually I got a working solution, but it was a clumsy process getting there. Why was such a simple question so difficult to implement on the spot?

In hindsight, I think this is a good interview question for a few reasons. The problem is deceivingly simple; just about anybody is capable of doing this correctly, but it's difficult to do it well and efficiently, especially on the spot. Because the problem is so simple, the focus of evaluation lies in your style and thought process. Do you clutter your code with unnecessary variables? Is your code verbose or terse? How many functions do you compose, and why? Is your style sloppy or bug-prone? I didn't get that particular internship, because my code was definitely sloppy and bug-prone.

I think there's one more aspect of this question that causes applicants to freeze: strings. In C++ and Java, people tend to use the std::string/String classes, which have lots of utility functions built in. In day-to-day programming, you would rarely iterate over a string manually, and you might also avoid using pointers of type char*.

In this particular example, I think it's actually easier to use pointers than integer indices; this way you have fewer local variables to keep track of. To some extent, having fewer variables means fewer opportunities to introduce a bug.

In my version, I like splitting it up into two functions. The primary function simply iterates over the outer string, and searches each substring for the sought string. Heres' the code:

bool matched(char* sought, char* sent){
    //utility function
    while (*sought!='\0'){
        if (*sent=='\0') //sought is longer than sent
            return false;
        if (*sent!=*sought) //The characters don't match
            return false;
        sent++; sought++; //Check next character
    }
    return true;
}

int substring(char* sought, char* sentence){
    //primary function
    char* iter=sentence; //iterate over string
    while (*iter!='\0'){        
        if (matched(sought, iter)){
                return iter-sentence;
        }        
        iter++;
    }
    return -1;
}

As you can see, the pointers can reduce clutter in the code, because they already are indices in an array of characters.

Thursday, January 15, 2015

Matplotlib Animations Using Data queried from a SQL Table

In prior examples, I've provided a couple of snippets for writing videos using Matplotlib. In this example, instead of plotting data stored in files, I'm using data stored in a SQL table. In this particular instance, I'm using MySQL for convenience.

These days, your choice of database is about as controversial as discussing politics and religion; it's the third rail of...data? You can choose the best database, and table structure to suit your particular needs. MySQL serves just fine as an example. In any case, SQL syntax is mostly standard across databases. and the Python database interfaces are also quite standard; they mainly consist of instantiating a connection and cursor, and executing SQL commands.

Ok, so now to the code. In prior examples, we needed to know the number of frames to plot, when we initiate the matplotlib animation. In my case, I've structured the data so a given row in the table corresponds to one data frame, and to one plotted frame in the Matplotlib animation. So, I have a couple of utility functions, one to get the number of frames we'll plot, and the other to query a particular table entry, given the desired row [frame] and column.

 import sys  
 import os  
 import time  
 import MySQLdb as mdb  
 from TableDef import columns #Custom definition of the table structure  
   
 def count(table, serverIP, dbUser, password, database):  
   try:  
     con = mdb.connect(serverIP, dbUser, password,database)        
     with con:  
       cur = con.cursor()  
       com="SET sql_mode=\'ANSI_QUOTES\'"  
       cur.execute(com)  
       row_exists=0  
       com="SELECT COUNT(*) FROM \"%s\""%table  
       cur.execute(com)  
       rows=cur.fetchall()  
       count=rows[0][0]        
         
   except mdb.Error, e:   
       print "Error %d: %s" % (e.args[0],e.args[1])  
       sys.exit(1)    
   finally:      
       if con:    
         con.close()   
   return count  
   
 def getdataframe(table,column,frame,serverIP, user, password, database):      
     
   if column not in columns:  
     return  
   if frame>count(table):  
     return  
   try:  
     con = mdb.connect(serverIP, dbUser, password,database);        
     with con:  
       cur = con.cursor()  
       com="SET sql_mode=\'ANSI_QUOTES\'"  
       cur.execute(com)  
       row_exists=0  
       com="SELECT %s "%column  
       com=com+"FROM \"%s\" "%table  
       com=com+"WHERE Frame=%s"%str(frame)        
       cur.execute(com)  
       rows=cur.fetchall()  
       data=rows[0][0]        
         
   except mdb.Error, e:   
       print "Error %d: %s" % (e.args[0],e.args[1])  
       sys.exit(1)    
   finally:      
       if con:    
         con.close()   
           
   return data

Note that I have a custom Python class called TableDef, where I've declared various parameters relevant to the table and database. This part depends on your particular application, how you structure your data, etc. I could have wrapped these table functions in a class if I wanted a more OO feel.

Now, when I go my Matplotlib animation, I can use these utility functions to get the total number of frames, and to query individual data frames.

 import sys  
 import os  
 import json  
 import numpy as np  
 import matplotlib  
 import matplotlib.pyplot as plt  
 import matplotlib.animation as animation  
 from file_utils import * #table from folder is defined here  
 import table_read as Table  
 from TableDef import columns #Custom definition of the table structure  
 from TableDef import DatabaseCredentials as DB #Custom definition of the table structure  
   
 def init():  
   plotline.set_data([], [])  
   dplotline.set_data([],[])  
   return plotline,dplotline,  
   
 def animate(i):          
   global buffersize    
   if (i%100==0):  
     print i        
   line= Table.getdataframe(table,columns[1],i,DB.serverIP, DB.user, DB.password, DB.name)    
   
   try:  
     jsondata=json.loads(line)    
   except ValueError:  
     print "Invalid JSON"  
     print line  
     f.close()  
     os.system("pause")      
     
   pos=jsondata["Position"]       
   vel=jsondata["Velocity"]  
   
   pos_buf.append(pos)  
   vel_buf.append(vel)  
     
   while len(pos_buf)>buffersize:  
     pos_buf.pop(0)  
   while len(vel_buf)>buffersize:  
     vel_buf.pop(0)  
     
   x=np.linspace(-len(pos_buf)/framerate,0,len(pos_buf))  
   dx=np.linspace(-len(vel_buf)/framerate,0,len(vel_buf))        
     
   plotline.set_data(x,headpose_buf)  
   dplotline.set_data(dx,headvel_buf)    
   return plotline, dplotline,  
   
 table=""   
 framerate=20.0  
 buffersize=5*20  
 vel_buf=[]  
 pos_buf=[]  
   
 fig, ax = plt.subplots()  
 ax.set_xlim(-5.0,2.0)  
 ax.set_ylim(-180,180)  
 my_dpi=96  
 fig.set_size_inches(640/my_dpi, 360/my_dpi)    
 plotline, =ax.plot([],[],lw=2,color='b')  
 dplotline, =ax.plot([],[], lw=2, color='r')  
   
 if __name__=='__main__':  
   if len(sys.argv)<2:  
     print "Usage:\nplot_pos_vel.py folder"  
     exit()  
     
   folder= sys.argv[1]       
   outfile=folder+"/pos_vel.avi"  
   if (os.path.isfile(outfile)):  
     exit()   
   table=table_from_folder(folder)  #Custom function that maps folders to their table representation, user defined  
   line_count=Table.count(table)    
     
   Writer = animation.writers['ffmpeg']  
   writer = Writer(fps=25, metadata=dict(artist='Sayanan'), extra_args=['-vcodec','libx264'])      
   anim = animation.FuncAnimation(fig, animate,init_func=init, frames=line_count-1,blit=True, interval=1, repeat=False)      
   anim.save(outfile,writer=writer)  
     
   #plt.show()

Within the entries in the table, I've further structured my data as json for position and velocity. Again, this is really a design choice, and depending on your database, json encoding might be native.

Other than that, this is the same Matplotlib example as before, but using a database for the data.

Saturday, December 27, 2014

Branching out in Martial Arts

I've been training in Shotokan karate on-and-off for the past 15 years or so. The original Okinawan curriculum, dating back to Itosu in the late 19th century, consisted exclusively of katas. Hence, the katase must be taken as the core representation of the art form and combat system.

The modern curriculum tends to focus on sharp strikes, punches, and kicks. This is great, but the modern curriculum tends to under-emphasize the joint-locks, grappling, throws, and takedowns found throughout the katas. In fact, the kihon [basics] and kumite [sparring] curriculum tends to look quite different from the katas. One interpretation of this discrepancy, is that the katas are antiquated, mysterious, or impractical. My preferred interpretation is that we're missing something in our basics and sparring training. The grappling/throwing aspect is often under-emphasized.

I'm not the originator of this interpretation. Established practitioners and writers like Bruce Clayton, Iain Abernathy, and Kousaku Yokota have been popularizing this under-emphasized portion of the curriculum for many years. The under-emphasis comes from an over-emphasis in training in a way that complies with tournament rules. Throwing an opponent will not gain points under standard karate tournament rules, but punching or kicking him will.

Anyway, from time-to-time, I've started leading classes at my local shotokan dojo. I make sure to emphasize the grappling and throwing, and make sure the students practice these in pairs, as well as practicing how to fall correctly. I'm also taking a beginner's course in Judo, to get a more systematic treatment of throwing.

At the end of the day, there is nothing mystical about martial arts technique. It's just physics and anatomy. This is why there is great commonality in punching technique across art-forms, from boxing to karate to muay thai.

Friday, November 14, 2014

Pallindromes in C++

Sometimes you come across puzzles or interview questions along the lines of "Find the longest palindrome in a string." To be honest, I've never looked for palindromes in my work, but I think it's a good exercise nonetheless. For one thing, the "best practices" strategies can be applied to other problems. For another, the search for symmetric subsets in datasets of all types definitely does come up.

So, we might start this exercise by first writing a function to classify a string as a palindrome or not. A nice way to do this is to start with a pointer at either end of the string, and iterate them inwards, checking the lower and upper characters against each other at each stage. For example, if the first and last characters are equal, we iterate inwards. If not, we can return "false" immmediately. The following snippet shows this function.

 bool is_pallindrome(string input){  
      int lower=0;  
      int upper=input.length()-1;  
      while (upper>lower){  
           if (input[upper]!=input[lower]) //If we find differing characters, not a pallindrome  
                return false;  
           upper--;  
           lower++;  
      }  
      return true;  
 }

So now, to find the longest palindrome in a string, we could just iterate over the string, looking at all substrings, and calling is_pallindrome() [notice my typo in the function name]. That sounds pretty straight-forward, but searching all substrings becomes O(N^2), kind of cumbersome.

A nice way to use the is_pallindrome() function is to think about strings that are centered at index i. So, we have a pointer that searches left from i, and a pointer that searches right from i, and we stop iterating when input[lower]!=input[upper]. This is very similar to the is_pallindrome() function.

There's another approach that's more efficient than the above. Sometimes a good way to think about such a problem is to find the desired subset that ends at index i. If I'm searching for a palindrome that ends at index i, I can store the longest palindrome that ended at index i-1. Then I simply have to look at the first index of that palindrome, and the character that precedes it in the input string.

The storage/caching can be implemented with a hashmap, so we store the longest palindrome that ends at each index, and then we can look it up at the next step. This means, each additional step only features two more comparisons. The following snippet implements this approach:

 string longest_pallindrome_map(string input){  
      map<int, string> pal; //hashmap from index to longest pallindrome that *ends* at index  
      int max_length=0; int max_index=0;  
      for (int i=0; i<input.size(); i++){  
           int prev_index=i-pal[i-1].length()-1;            
           if (pal[i-1][0]==input[i]) //Looking for consecutive duplicate letters  
                pal[i]=pal[i-1]+input[i];            
           else if (input[i]==input[prev_index])  
                pal[i]=input[prev_index]+pal[i-1]+input[i]; //Looking at the character prior to previous pallindrome  
           else  
                pal[i]=string(input.begin()+i, input.begin()+i+1); //Longest pallindrome is current character  
   
           if (pal[i].length()>max_length){  
                max_index=i;  
                max_length=pal[i].length();            
           }            
      }  
      return pal[max_index];  
 }

This is actually quite a bit more efficient in the searching, more like pseudo-linear instead of quadratic search. [Again notice my mispelling of palindrome.] The only thing is, we are storing a bunch of strings in this hashmap. We don't actually need to do this.

Instead, we can just store the indices or pointers that begin the palindromes that ends at each index in the input string. The following function does this.

 string longest_pallindrome(string input){       
      map<int, int> pal; //hashmap from current index, to the begin-index of the longest pallindrome that *ends* here  
      int max_length=0;  
      int max_ind=0;  
      for (int i=0; i<input.size(); i++){                           
           int prev_index=pal[i-1];                                
           if (input[i]==input[prev_index]){ //Looking for consecutive duplicate characters  
                pal[i]=prev_index;  
           }  
           else if (input[i]==input[prev_index-1]) //Looking at the index prior to the last pallindrome  
                pal[i]=prev_index-1;  
           else  
                pal[i]=i; //Longest pallindrome is simply the current character  
           if ((i-pal[i])>max_length){ //Finding the longest one along the way  
                max_ind=i;  
                max_length=i-pal[i];  
           }                           
      }            
      return string(input.begin()+pal[max_ind], input.begin()+max_ind+1);  
 }

So now we have, via dynamic programming, a pseudo-linear search, in that we basically have a linear search over the input string, but we have to store O(N) indices.

Did I really need the hashmap? Not necessarily; I could have used a regular old array, and had some code to check_bounds. On the other hand, the hashmap approach is generalizable to other problems, where the key values might not be simple ordinal numbers.

Pages