Monday, January 26, 2015

Find the first substring in C++

Years ago, I interviewed for a summer internship, and I got a coding question along the lines of: "Implement a function that finds a substring within a string. If that substring is found, return the index of the found substring."

I fumbled through the question. Eventually I got a working solution, but it was a clumsy process getting there. Why was such a simple question so difficult to implement on the spot?

In hindsight, I think this is a good interview question for a few reasons. The problem is deceivingly simple; just about anybody is capable of doing this correctly, but it's difficult to do it well and efficiently, especially on the spot. Because the problem is so simple, the focus of evaluation lies in your style and thought process. Do you clutter your code with unnecessary variables? Is your code verbose or terse? How many functions do you compose, and why? Is your style sloppy or bug-prone? I didn't get that particular internship, because my code was definitely sloppy and bug-prone.

I think there's one more aspect of this question that causes applicants to freeze: strings. In C++ and Java, people tend to use the std::string/String classes, which have lots of utility functions built in. In day-to-day programming, you would rarely iterate over a string manually, and you might also avoid using pointers of type char*.

In this particular example, I think it's actually easier to use pointers than integer indices; this way you have fewer local variables to keep track of. To some extent, having fewer variables means fewer opportunities to introduce a bug.

In my version, I like splitting it up into two functions. The primary function simply iterates over the outer string, and searches each substring for the sought string. Heres' the code:

bool matched(char* sought, char* sent){
    //utility function
    while (*sought!='\0'){
        if (*sent=='\0') //sought is longer than sent
            return false;
        if (*sent!=*sought) //The characters don't match
            return false;
        sent++; sought++; //Check next character
    }
    return true;
}

int substring(char* sought, char* sentence){
    //primary function
    char* iter=sentence; //iterate over string
    while (*iter!='\0'){        
        if (matched(sought, iter)){
                return iter-sentence;
        }        
        iter++;
    }
    return -1;
}

As you can see, the pointers can reduce clutter in the code, because they already are indices in an array of characters.

Thursday, January 15, 2015

Matplotlib Animations Using Data queried from a SQL Table

In prior examples, I've provided a couple of snippets for writing videos using Matplotlib. In this example, instead of plotting data stored in files, I'm using data stored in a SQL table. In this particular instance, I'm using MySQL for convenience.

These days, your choice of database is about as controversial as discussing politics and religion; it's the third rail of...data? You can choose the best database, and table structure to suit your particular needs. MySQL serves just fine as an example. In any case, SQL syntax is mostly standard across databases. and the Python database interfaces are also quite standard; they mainly consist of instantiating a connection and cursor, and executing SQL commands.

Ok, so now to the code. In prior examples, we needed to know the number of frames to plot, when we initiate the matplotlib animation. In my case, I've structured the data so a given row in the table corresponds to one data frame, and to one plotted frame in the Matplotlib animation. So, I have a couple of utility functions, one to get the number of frames we'll plot, and the other to query a particular table entry, given the desired row [frame] and column.


 import sys  
 import os  
 import time  
 import MySQLdb as mdb  
 from TableDef import columns #Custom definition of the table structure  
   
 def count(table, serverIP, dbUser, password, database):  
   try:  
     con = mdb.connect(serverIP, dbUser, password,database)        
     with con:  
       cur = con.cursor()  
       com="SET sql_mode=\'ANSI_QUOTES\'"  
       cur.execute(com)  
       row_exists=0  
       com="SELECT COUNT(*) FROM \"%s\""%table  
       cur.execute(com)  
       rows=cur.fetchall()  
       count=rows[0][0]        
         
   except mdb.Error, e:   
       print "Error %d: %s" % (e.args[0],e.args[1])  
       sys.exit(1)    
   finally:      
       if con:    
         con.close()   
   return count  
   
 def getdataframe(table,column,frame,serverIP, user, password, database):      
     
   if column not in columns:  
     return  
   if frame>count(table):  
     return  
   try:  
     con = mdb.connect(serverIP, dbUser, password,database);        
     with con:  
       cur = con.cursor()  
       com="SET sql_mode=\'ANSI_QUOTES\'"  
       cur.execute(com)  
       row_exists=0  
       com="SELECT %s "%column  
       com=com+"FROM \"%s\" "%table  
       com=com+"WHERE Frame=%s"%str(frame)        
       cur.execute(com)  
       rows=cur.fetchall()  
       data=rows[0][0]        
         
   except mdb.Error, e:   
       print "Error %d: %s" % (e.args[0],e.args[1])  
       sys.exit(1)    
   finally:      
       if con:    
         con.close()   
           
   return data    

Note that I have a custom Python class called TableDef, where I've declared various parameters relevant to the table and database. This part depends on your particular application, how you structure your data, etc. I could have wrapped these table functions in a class if I wanted a more OO feel.

Now, when I go my Matplotlib animation, I can use these utility functions to get the total number of frames, and to query individual data frames.


 import sys  
 import os  
 import json  
 import numpy as np  
 import matplotlib  
 import matplotlib.pyplot as plt  
 import matplotlib.animation as animation  
 from file_utils import * #table from folder is defined here  
 import table_read as Table  
 from TableDef import columns #Custom definition of the table structure  
 from TableDef import DatabaseCredentials as DB #Custom definition of the table structure  
   
 def init():  
   plotline.set_data([], [])  
   dplotline.set_data([],[])  
   return plotline,dplotline,  
   
 def animate(i):          
   global buffersize    
   if (i%100==0):  
     print i        
   line= Table.getdataframe(table,columns[1],i,DB.serverIP, DB.user, DB.password, DB.name)    
   
   try:  
     jsondata=json.loads(line)    
   except ValueError:  
     print "Invalid JSON"  
     print line  
     f.close()  
     os.system("pause")      
     
   pos=jsondata["Position"]       
   vel=jsondata["Velocity"]  
   
   pos_buf.append(pos)  
   vel_buf.append(vel)  
     
   while len(pos_buf)>buffersize:  
     pos_buf.pop(0)  
   while len(vel_buf)>buffersize:  
     vel_buf.pop(0)  
     
   x=np.linspace(-len(pos_buf)/framerate,0,len(pos_buf))  
   dx=np.linspace(-len(vel_buf)/framerate,0,len(vel_buf))        
     
   plotline.set_data(x,headpose_buf)  
   dplotline.set_data(dx,headvel_buf)    
   return plotline, dplotline,  
   
 table=""   
 framerate=20.0  
 buffersize=5*20  
 vel_buf=[]  
 pos_buf=[]  
   
 fig, ax = plt.subplots()  
 ax.set_xlim(-5.0,2.0)  
 ax.set_ylim(-180,180)  
 my_dpi=96  
 fig.set_size_inches(640/my_dpi, 360/my_dpi)    
 plotline, =ax.plot([],[],lw=2,color='b')  
 dplotline, =ax.plot([],[], lw=2, color='r')  
   
 if __name__=='__main__':  
   if len(sys.argv)<2:  
     print "Usage:\nplot_pos_vel.py folder"  
     exit()  
     
   folder= sys.argv[1]       
   outfile=folder+"/pos_vel.avi"  
   if (os.path.isfile(outfile)):  
     exit()   
   table=table_from_folder(folder)  #Custom function that maps folders to their table representation, user defined  
   line_count=Table.count(table)    
     
   Writer = animation.writers['ffmpeg']  
   writer = Writer(fps=25, metadata=dict(artist='Sayanan'), extra_args=['-vcodec','libx264'])      
   anim = animation.FuncAnimation(fig, animate,init_func=init, frames=line_count-1,blit=True, interval=1, repeat=False)      
   anim.save(outfile,writer=writer)  
     
   #plt.show()  
     

Within the entries in the table, I've further structured my data as json for position and velocity. Again, this is really a design choice, and depending on your database, json encoding might be native.

Other than that, this is the same Matplotlib example as before, but using a database for the data.