El Nino climate events are associated with low atmospheric pressure, resulting in cooler, wetter weather in California. Last autumn, with climate scientists predicting an upcoming El Nino event, there was hope that it would coincide with California's winter, which is the rainy season, leading to healthy rainfall and mitigating the 4-year-old drought.
Unfortunately, El Nino never arrived during California's rainy season. It has very recently arrived, as we can verify at NOAA's website here. This El Nino event is still associated with cooler, wetter weather, relative to a typical California spring and summer. However, at this time of year, there aren't many major wet storms in the Northern Pacific to route to California. As such, we'll have a chilly, drizzly spring and summer, but no impact in the way of drought relief.
I lived in San Diego during one of the recent El Nino summers. It never really got warm. It was a chilly, unsatisfying summer. Perhaps this summer's weather will be similar.
Saturday, May 23, 2015
Sunday, May 3, 2015
The Housing Crunch Continues
In the SF Bay Area, the housing crunch, complete with spikes in rent and home prices, continues. I've maintained the whole time, that the housing prices are not even justified for high-paid tech workers. Most people who work as engineers in this area cannot afford to purchase a house, and even finding reasonable rent is difficult.
The local news featured this article recently. It's anecdotal, but the premise is that even google software engineers have a difficult time finding housing in Mountain View.
The local news featured this article recently. It's anecdotal, but the premise is that even google software engineers have a difficult time finding housing in Mountain View.
Monday, February 9, 2015
Enjoying Judo
At this point, I've been training judo for about a month, and I'm really enjoying it. It's pretty amazing how the art exploits balance and leverage. The most elegant judo throws and sweeps use the opponent's weight and balance against him. It's an art that accommodates different body types, heights, weights, etc. In fact, being short with a low center of mass can be quite an asset; shorter opponents are often harder to throw, but have an easier time throwing their taller opponents.
I enjoy the variety that judo presents as well. There are throws, sweeps, takedowns, and ground-based grappling. I've been attending trainings at Cahill Judo academy in San Bruno. The chief instructor there, Willy Cahill, has coached judo for decades, including work with the US Olympic team.
I enjoy the variety that judo presents as well. There are throws, sweeps, takedowns, and ground-based grappling. I've been attending trainings at Cahill Judo academy in San Bruno. The chief instructor there, Willy Cahill, has coached judo for decades, including work with the US Olympic team.
Monday, January 26, 2015
Find the first substring in C++
Years ago, I interviewed for a summer internship, and I got a coding question along the lines of: "Implement a function that finds a substring within a string. If that substring is found, return the index of the found substring."
I fumbled through the question. Eventually I got a working solution, but it was a clumsy process getting there. Why was such a simple question so difficult to implement on the spot?
In hindsight, I think this is a good interview question for a few reasons. The problem is deceivingly simple; just about anybody is capable of doing this correctly, but it's difficult to do it well and efficiently, especially on the spot. Because the problem is so simple, the focus of evaluation lies in your style and thought process. Do you clutter your code with unnecessary variables? Is your code verbose or terse? How many functions do you compose, and why? Is your style sloppy or bug-prone? I didn't get that particular internship, because my code was definitely sloppy and bug-prone.
I think there's one more aspect of this question that causes applicants to freeze: strings. In C++ and Java, people tend to use the std::string/String classes, which have lots of utility functions built in. In day-to-day programming, you would rarely iterate over a string manually, and you might also avoid using pointers of type char*.
In this particular example, I think it's actually easier to use pointers than integer indices; this way you have fewer local variables to keep track of. To some extent, having fewer variables means fewer opportunities to introduce a bug.
In my version, I like splitting it up into two functions. The primary function simply iterates over the outer string, and searches each substring for the sought string. Heres' the code:
As you can see, the pointers can reduce clutter in the code, because they already are indices in an array of characters.
I fumbled through the question. Eventually I got a working solution, but it was a clumsy process getting there. Why was such a simple question so difficult to implement on the spot?
In hindsight, I think this is a good interview question for a few reasons. The problem is deceivingly simple; just about anybody is capable of doing this correctly, but it's difficult to do it well and efficiently, especially on the spot. Because the problem is so simple, the focus of evaluation lies in your style and thought process. Do you clutter your code with unnecessary variables? Is your code verbose or terse? How many functions do you compose, and why? Is your style sloppy or bug-prone? I didn't get that particular internship, because my code was definitely sloppy and bug-prone.
I think there's one more aspect of this question that causes applicants to freeze: strings. In C++ and Java, people tend to use the std::string/String classes, which have lots of utility functions built in. In day-to-day programming, you would rarely iterate over a string manually, and you might also avoid using pointers of type char*.
In this particular example, I think it's actually easier to use pointers than integer indices; this way you have fewer local variables to keep track of. To some extent, having fewer variables means fewer opportunities to introduce a bug.
In my version, I like splitting it up into two functions. The primary function simply iterates over the outer string, and searches each substring for the sought string. Heres' the code:
bool matched(char* sought, char* sent){
//utility function
while (*sought!='\0'){
if (*sent=='\0') //sought is longer than sent
return false;
if (*sent!=*sought) //The characters don't match
return false;
sent++; sought++; //Check next character
}
return true;
}
int substring(char* sought, char* sentence){
//primary function
char* iter=sentence; //iterate over string
while (*iter!='\0'){
if (matched(sought, iter)){
return iter-sentence;
}
iter++;
}
return -1;
}
As you can see, the pointers can reduce clutter in the code, because they already are indices in an array of characters.
Thursday, January 15, 2015
Matplotlib Animations Using Data queried from a SQL Table
In prior examples, I've provided a couple of snippets for writing videos using Matplotlib. In this example, instead of plotting data stored in files, I'm using data stored in a SQL table. In this particular instance, I'm using MySQL for convenience.
These days, your choice of database is about as controversial as discussing politics and religion; it's the third rail of...data? You can choose the best database, and table structure to suit your particular needs. MySQL serves just fine as an example. In any case, SQL syntax is mostly standard across databases. and the Python database interfaces are also quite standard; they mainly consist of instantiating a connection and cursor, and executing SQL commands.
Ok, so now to the code. In prior examples, we needed to know the number of frames to plot, when we initiate the matplotlib animation. In my case, I've structured the data so a given row in the table corresponds to one data frame, and to one plotted frame in the Matplotlib animation. So, I have a couple of utility functions, one to get the number of frames we'll plot, and the other to query a particular table entry, given the desired row [frame] and column.
Note that I have a custom Python class called TableDef, where I've declared various parameters relevant to the table and database. This part depends on your particular application, how you structure your data, etc. I could have wrapped these table functions in a class if I wanted a more OO feel.
Now, when I go my Matplotlib animation, I can use these utility functions to get the total number of frames, and to query individual data frames.
Within the entries in the table, I've further structured my data as json for position and velocity. Again, this is really a design choice, and depending on your database, json encoding might be native.
Other than that, this is the same Matplotlib example as before, but using a database for the data.
These days, your choice of database is about as controversial as discussing politics and religion; it's the third rail of...data? You can choose the best database, and table structure to suit your particular needs. MySQL serves just fine as an example. In any case, SQL syntax is mostly standard across databases. and the Python database interfaces are also quite standard; they mainly consist of instantiating a connection and cursor, and executing SQL commands.
Ok, so now to the code. In prior examples, we needed to know the number of frames to plot, when we initiate the matplotlib animation. In my case, I've structured the data so a given row in the table corresponds to one data frame, and to one plotted frame in the Matplotlib animation. So, I have a couple of utility functions, one to get the number of frames we'll plot, and the other to query a particular table entry, given the desired row [frame] and column.
import sys
import os
import time
import MySQLdb as mdb
from TableDef import columns #Custom definition of the table structure
def count(table, serverIP, dbUser, password, database):
try:
con = mdb.connect(serverIP, dbUser, password,database)
with con:
cur = con.cursor()
com="SET sql_mode=\'ANSI_QUOTES\'"
cur.execute(com)
row_exists=0
com="SELECT COUNT(*) FROM \"%s\""%table
cur.execute(com)
rows=cur.fetchall()
count=rows[0][0]
except mdb.Error, e:
print "Error %d: %s" % (e.args[0],e.args[1])
sys.exit(1)
finally:
if con:
con.close()
return count
def getdataframe(table,column,frame,serverIP, user, password, database):
if column not in columns:
return
if frame>count(table):
return
try:
con = mdb.connect(serverIP, dbUser, password,database);
with con:
cur = con.cursor()
com="SET sql_mode=\'ANSI_QUOTES\'"
cur.execute(com)
row_exists=0
com="SELECT %s "%column
com=com+"FROM \"%s\" "%table
com=com+"WHERE Frame=%s"%str(frame)
cur.execute(com)
rows=cur.fetchall()
data=rows[0][0]
except mdb.Error, e:
print "Error %d: %s" % (e.args[0],e.args[1])
sys.exit(1)
finally:
if con:
con.close()
return data
Note that I have a custom Python class called TableDef, where I've declared various parameters relevant to the table and database. This part depends on your particular application, how you structure your data, etc. I could have wrapped these table functions in a class if I wanted a more OO feel.
Now, when I go my Matplotlib animation, I can use these utility functions to get the total number of frames, and to query individual data frames.
import sys
import os
import json
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from file_utils import * #table from folder is defined here
import table_read as Table
from TableDef import columns #Custom definition of the table structure
from TableDef import DatabaseCredentials as DB #Custom definition of the table structure
def init():
plotline.set_data([], [])
dplotline.set_data([],[])
return plotline,dplotline,
def animate(i):
global buffersize
if (i%100==0):
print i
line= Table.getdataframe(table,columns[1],i,DB.serverIP, DB.user, DB.password, DB.name)
try:
jsondata=json.loads(line)
except ValueError:
print "Invalid JSON"
print line
f.close()
os.system("pause")
pos=jsondata["Position"]
vel=jsondata["Velocity"]
pos_buf.append(pos)
vel_buf.append(vel)
while len(pos_buf)>buffersize:
pos_buf.pop(0)
while len(vel_buf)>buffersize:
vel_buf.pop(0)
x=np.linspace(-len(pos_buf)/framerate,0,len(pos_buf))
dx=np.linspace(-len(vel_buf)/framerate,0,len(vel_buf))
plotline.set_data(x,headpose_buf)
dplotline.set_data(dx,headvel_buf)
return plotline, dplotline,
table=""
framerate=20.0
buffersize=5*20
vel_buf=[]
pos_buf=[]
fig, ax = plt.subplots()
ax.set_xlim(-5.0,2.0)
ax.set_ylim(-180,180)
my_dpi=96
fig.set_size_inches(640/my_dpi, 360/my_dpi)
plotline, =ax.plot([],[],lw=2,color='b')
dplotline, =ax.plot([],[], lw=2, color='r')
if __name__=='__main__':
if len(sys.argv)<2:
print "Usage:\nplot_pos_vel.py folder"
exit()
folder= sys.argv[1]
outfile=folder+"/pos_vel.avi"
if (os.path.isfile(outfile)):
exit()
table=table_from_folder(folder) #Custom function that maps folders to their table representation, user defined
line_count=Table.count(table)
Writer = animation.writers['ffmpeg']
writer = Writer(fps=25, metadata=dict(artist='Sayanan'), extra_args=['-vcodec','libx264'])
anim = animation.FuncAnimation(fig, animate,init_func=init, frames=line_count-1,blit=True, interval=1, repeat=False)
anim.save(outfile,writer=writer)
#plt.show()
Within the entries in the table, I've further structured my data as json for position and velocity. Again, this is really a design choice, and depending on your database, json encoding might be native.
Other than that, this is the same Matplotlib example as before, but using a database for the data.
Saturday, December 27, 2014
Branching out in Martial Arts
I've been training in Shotokan karate on-and-off for the past 15 years or so. The original Okinawan curriculum, dating back to Itosu in the late 19th century, consisted exclusively of katas. Hence, the katase must be taken as the core representation of the art form and combat system.
The modern curriculum tends to focus on sharp strikes, punches, and kicks. This is great, but the modern curriculum tends to under-emphasize the joint-locks, grappling, throws, and takedowns found throughout the katas. In fact, the kihon [basics] and kumite [sparring] curriculum tends to look quite different from the katas. One interpretation of this discrepancy, is that the katas are antiquated, mysterious, or impractical. My preferred interpretation is that we're missing something in our basics and sparring training. The grappling/throwing aspect is often under-emphasized.
I'm not the originator of this interpretation. Established practitioners and writers like Bruce Clayton, Iain Abernathy, and Kousaku Yokota have been popularizing this under-emphasized portion of the curriculum for many years. The under-emphasis comes from an over-emphasis in training in a way that complies with tournament rules. Throwing an opponent will not gain points under standard karate tournament rules, but punching or kicking him will.
Anyway, from time-to-time, I've started leading classes at my local shotokan dojo. I make sure to emphasize the grappling and throwing, and make sure the students practice these in pairs, as well as practicing how to fall correctly. I'm also taking a beginner's course in Judo, to get a more systematic treatment of throwing.
At the end of the day, there is nothing mystical about martial arts technique. It's just physics and anatomy. This is why there is great commonality in punching technique across art-forms, from boxing to karate to muay thai.
The modern curriculum tends to focus on sharp strikes, punches, and kicks. This is great, but the modern curriculum tends to under-emphasize the joint-locks, grappling, throws, and takedowns found throughout the katas. In fact, the kihon [basics] and kumite [sparring] curriculum tends to look quite different from the katas. One interpretation of this discrepancy, is that the katas are antiquated, mysterious, or impractical. My preferred interpretation is that we're missing something in our basics and sparring training. The grappling/throwing aspect is often under-emphasized.
I'm not the originator of this interpretation. Established practitioners and writers like Bruce Clayton, Iain Abernathy, and Kousaku Yokota have been popularizing this under-emphasized portion of the curriculum for many years. The under-emphasis comes from an over-emphasis in training in a way that complies with tournament rules. Throwing an opponent will not gain points under standard karate tournament rules, but punching or kicking him will.
Anyway, from time-to-time, I've started leading classes at my local shotokan dojo. I make sure to emphasize the grappling and throwing, and make sure the students practice these in pairs, as well as practicing how to fall correctly. I'm also taking a beginner's course in Judo, to get a more systematic treatment of throwing.
At the end of the day, there is nothing mystical about martial arts technique. It's just physics and anatomy. This is why there is great commonality in punching technique across art-forms, from boxing to karate to muay thai.
Friday, November 14, 2014
Pallindromes in C++
Sometimes you come across puzzles or interview questions along the lines of "Find the longest palindrome in a string." To be honest, I've never looked for palindromes in my work, but I think it's a good exercise nonetheless. For one thing, the "best practices" strategies can be applied to other problems. For another, the search for symmetric subsets in datasets of all types definitely does come up.
So, we might start this exercise by first writing a function to classify a string as a palindrome or not. A nice way to do this is to start with a pointer at either end of the string, and iterate them inwards, checking the lower and upper characters against each other at each stage. For example, if the first and last characters are equal, we iterate inwards. If not, we can return "false" immmediately. The following snippet shows this function.
So now, to find the longest palindrome in a string, we could just iterate over the string, looking at all substrings, and calling is_pallindrome() [notice my typo in the function name]. That sounds pretty straight-forward, but searching all substrings becomes O(N^2), kind of cumbersome.
A nice way to use the is_pallindrome() function is to think about strings that are centered at index i. So, we have a pointer that searches left from i, and a pointer that searches right from i, and we stop iterating when input[lower]!=input[upper]. This is very similar to the is_pallindrome() function.
There's another approach that's more efficient than the above. Sometimes a good way to think about such a problem is to find the desired subset that ends at index i. If I'm searching for a palindrome that ends at index i, I can store the longest palindrome that ended at index i-1. Then I simply have to look at the first index of that palindrome, and the character that precedes it in the input string.
The storage/caching can be implemented with a hashmap, so we store the longest palindrome that ends at each index, and then we can look it up at the next step. This means, each additional step only features two more comparisons. The following snippet implements this approach:
This is actually quite a bit more efficient in the searching, more like pseudo-linear instead of quadratic search. [Again notice my mispelling of palindrome.] The only thing is, we are storing a bunch of strings in this hashmap. We don't actually need to do this.
Instead, we can just store the indices or pointers that begin the palindromes that ends at each index in the input string. The following function does this.
So now we have, via dynamic programming, a pseudo-linear search, in that we basically have a linear search over the input string, but we have to store O(N) indices.
Did I really need the hashmap? Not necessarily; I could have used a regular old array, and had some code to check_bounds. On the other hand, the hashmap approach is generalizable to other problems, where the key values might not be simple ordinal numbers.
So, we might start this exercise by first writing a function to classify a string as a palindrome or not. A nice way to do this is to start with a pointer at either end of the string, and iterate them inwards, checking the lower and upper characters against each other at each stage. For example, if the first and last characters are equal, we iterate inwards. If not, we can return "false" immmediately. The following snippet shows this function.
bool is_pallindrome(string input){
int lower=0;
int upper=input.length()-1;
while (upper>lower){
if (input[upper]!=input[lower]) //If we find differing characters, not a pallindrome
return false;
upper--;
lower++;
}
return true;
}
So now, to find the longest palindrome in a string, we could just iterate over the string, looking at all substrings, and calling is_pallindrome() [notice my typo in the function name]. That sounds pretty straight-forward, but searching all substrings becomes O(N^2), kind of cumbersome.
A nice way to use the is_pallindrome() function is to think about strings that are centered at index i. So, we have a pointer that searches left from i, and a pointer that searches right from i, and we stop iterating when input[lower]!=input[upper]. This is very similar to the is_pallindrome() function.
There's another approach that's more efficient than the above. Sometimes a good way to think about such a problem is to find the desired subset that ends at index i. If I'm searching for a palindrome that ends at index i, I can store the longest palindrome that ended at index i-1. Then I simply have to look at the first index of that palindrome, and the character that precedes it in the input string.
The storage/caching can be implemented with a hashmap, so we store the longest palindrome that ends at each index, and then we can look it up at the next step. This means, each additional step only features two more comparisons. The following snippet implements this approach:
string longest_pallindrome_map(string input){
map<int, string> pal; //hashmap from index to longest pallindrome that *ends* at index
int max_length=0; int max_index=0;
for (int i=0; i<input.size(); i++){
int prev_index=i-pal[i-1].length()-1;
if (pal[i-1][0]==input[i]) //Looking for consecutive duplicate letters
pal[i]=pal[i-1]+input[i];
else if (input[i]==input[prev_index])
pal[i]=input[prev_index]+pal[i-1]+input[i]; //Looking at the character prior to previous pallindrome
else
pal[i]=string(input.begin()+i, input.begin()+i+1); //Longest pallindrome is current character
if (pal[i].length()>max_length){
max_index=i;
max_length=pal[i].length();
}
}
return pal[max_index];
}
This is actually quite a bit more efficient in the searching, more like pseudo-linear instead of quadratic search. [Again notice my mispelling of palindrome.] The only thing is, we are storing a bunch of strings in this hashmap. We don't actually need to do this.
Instead, we can just store the indices or pointers that begin the palindromes that ends at each index in the input string. The following function does this.
string longest_pallindrome(string input){
map<int, int> pal; //hashmap from current index, to the begin-index of the longest pallindrome that *ends* here
int max_length=0;
int max_ind=0;
for (int i=0; i<input.size(); i++){
int prev_index=pal[i-1];
if (input[i]==input[prev_index]){ //Looking for consecutive duplicate characters
pal[i]=prev_index;
}
else if (input[i]==input[prev_index-1]) //Looking at the index prior to the last pallindrome
pal[i]=prev_index-1;
else
pal[i]=i; //Longest pallindrome is simply the current character
if ((i-pal[i])>max_length){ //Finding the longest one along the way
max_ind=i;
max_length=i-pal[i];
}
}
return string(input.begin()+pal[max_ind], input.begin()+max_ind+1);
}
So now we have, via dynamic programming, a pseudo-linear search, in that we basically have a linear search over the input string, but we have to store O(N) indices.
Did I really need the hashmap? Not necessarily; I could have used a regular old array, and had some code to check_bounds. On the other hand, the hashmap approach is generalizable to other problems, where the key values might not be simple ordinal numbers.
Subscribe to:
Posts (Atom)