Monday, January 26, 2015

Find the first substring in C++

Years ago, I interviewed for a summer internship, and I got a coding question along the lines of: "Implement a function that finds a substring within a string. If that substring is found, return the index of the found substring."

I fumbled through the question. Eventually I got a working solution, but it was a clumsy process getting there. Why was such a simple question so difficult to implement on the spot?

In hindsight, I think this is a good interview question for a few reasons. The problem is deceivingly simple; just about anybody is capable of doing this correctly, but it's difficult to do it well and efficiently, especially on the spot. Because the problem is so simple, the focus of evaluation lies in your style and thought process. Do you clutter your code with unnecessary variables? Is your code verbose or terse? How many functions do you compose, and why? Is your style sloppy or bug-prone? I didn't get that particular internship, because my code was definitely sloppy and bug-prone.

I think there's one more aspect of this question that causes applicants to freeze: strings. In C++ and Java, people tend to use the std::string/String classes, which have lots of utility functions built in. In day-to-day programming, you would rarely iterate over a string manually, and you might also avoid using pointers of type char*.

In this particular example, I think it's actually easier to use pointers than integer indices; this way you have fewer local variables to keep track of. To some extent, having fewer variables means fewer opportunities to introduce a bug.

In my version, I like splitting it up into two functions. The primary function simply iterates over the outer string, and searches each substring for the sought string. Heres' the code:

bool matched(char* sought, char* sent){
    //utility function
    while (*sought!='\0'){
        if (*sent=='\0') //sought is longer than sent
            return false;
        if (*sent!=*sought) //The characters don't match
            return false;
        sent++; sought++; //Check next character
    }
    return true;
}

int substring(char* sought, char* sentence){
    //primary function
    char* iter=sentence; //iterate over string
    while (*iter!='\0'){        
        if (matched(sought, iter)){
                return iter-sentence;
        }        
        iter++;
    }
    return -1;
}

As you can see, the pointers can reduce clutter in the code, because they already are indices in an array of characters.

No comments:

Post a Comment