Previously, I wrote about how
reduce are import for data scientists to understand in a post on map and reduce in data science. This post is aimed at bringing data scientists further into the functional rabbit hole with two concepts: lambda functions and curried functions. Lambda functions and curried functions are commonplace in functional programming because they are flexible, expressive tools. As a data scientist, having them in your toolkit can make your code crisp and concise.
I wrote about functional programming in my last post, but I'll reiterate a few of the key points here before we begin. At its core, functional programming is a programming paradigm that is based around writing functions that take data in and return other data out---just like in math. And, because data scienctists tend to be mathy, this ends up being a good thing for data scientists.
Because functional programming is a paradigm built around functions (and around data), it also makes sense that we have a lot of flexibility with respect to how we use functions. Lambda functions and Curried functions are two ways we can be fliexble with functions.
Lambda functions, or anonymous functions, are typically small, ephemeral functions that are used in conjunction with higher order functions (functions that take functions as arguments, e.g.,
reduce.) A simple example of this could be a
reduce implementation of sum.
; Reduce sum with lambda - Clojure xs = [1 2 3 4 5] (reduce #(+ %1 %2) xs)
# Reduce sum with lambda - Python3 xs = [1,2,3,4,5] reduce(lambda x,y:x+y, xs)
Here, we see the lambda functions is Clojure's
#(+ %1 %2) and Python's
lambda x,y:x+y. We'll note that in the Python case, the word
lambda is used as a keyword to declare an lambda function function. In Clojure, where lambda functions are more common, all one has to do is prepend a
# to a standard functional call.
Note the two different ways of specifying the number of parameters for the function. Python asks us to declare names for the variables, e.g.,
y. Clojure allows us to refer to them by their position, e.g.,
%1 for the first argument and
%2 for the second argument.
In Clojure, and in most functional languages, a primary reason to use lambdas is convenience and brevity. If we want a helper function (a small function that "helps" a bigger function), it often makes sense to use a lambda instead of naming the tiny function. In Python---and in Clojure when interoperating with Java/Scala---one of the most helpful uses of a lambda is when we want to use a method over a list of items.
For example, imagine we're interested in splitting a number of strings on their whitespace. This could be part of the tokenization process for NLP work. That might look something like this.
# Example tokenize lambda w/ method - Python my_sents = ["Here is a sentence for tokenizing.", "I too am a sentence you'd like to tokenize.", "Me as well, also a sentence."] map(lambda x:x.split(), my_sents)
Note that in Clojure, we do not use a method, rather, we use the split function. We also have a more powerful split function, because we're using regular expressions.
; Example tokenize lambda - Clojure (require '[clojure.string :as str]) (let [my-sents ["Here is a sentence for tokenizing." "I too am a sentence you'd like to tokenize." "Me as well, also a sentence."]] (map #(str/split %1 #"\s+") my-sents) )
Also note that in Python, lambdas---while a fast and convenient way of doing this---are not the preferred approach. Python has a built-in function in its operator module for this:
methodcaller, as one images, returns a function that calls the named method on the object passed to it.
from operator import methodcaller map(methodcaller("split"), my_sents)
The functions from the operator module enjoy better performance in Python and they avoid some of the drawbacks of lambda in Python.
Google's Python style guide cautions against lambda functions for complex functions, but allows them for "one liners". In general, I agree with their approach. Lambdas are intended to be short, throwaway functions. If you find yourself using the same lambdas again and again, you're probably better off with a dedicated function.
Additionally, there are some issues with using lambdas in parallel Python. Python's built in pickle module is unable to properly handle lambdas, so if you're doing parallel work, be careful with lambdas. This is especially important to remember because it is natural to use lambdas alongside the higher-order functions
reduce, and these higher-order functions are common in parallel programming.
NB: In Clojure, this is not an issue. Lambda functions compile to JVM bytecode like everything else. In Python, its common to use the dill library to improve upon Python's standard serialization.
Lambdas are an excellent tool for use in data science. They pair naturally with higher-order functions, including
reduce, which I argue should play a vital role in any data scientists' toolkit. They should be used for simple, isolated manipulations.
A curried function is a function returned by a currying function, which itself is a function that returns a function. An example could be a generic power function, which returns a function that raises its input to that power.
NB: Clojure doesn't use currying, instead favoring partial application and input airty. For that reason, the remaining examples will be in Python only
def powerMaker(x): def toPower(y): return y**x return toPower squarer = powerMaker(2) cuber = powerMaker(3) squarer(3) cuber(3)
This concept is important for data science because for two reasons: it allows us to better thing of functions as data; and it allows us another way of thinking about classification and regression algorithm development. The first point is more a point about programming style and philosophy---I'll address that in a moment---let's start with the second point about how we conceive of classification/regression algorithms.
When we think of "data science algorithms" a whole list of options comes to mind: decision trees, KNN, Naive Bayes, neural networks, Support Vector Machines, linear modeling of various sorts, etc. These algorithms can all be (and in my view should all be) considered currying functions.
We demonstrated above how a currying function builds and returns another function. At their core, this is what each of the algorithms above do. The algorithms are a training function, which produces a classification or regression function. And the two halves, the training and the classification/regression are different. In some cases, this is quickly becomes evident with just a bit of thought.
Let's consider two cases to demonstrate this: Naive Bayes and decision trees.
For Naive Bayes the training function finds for each class the probability density function of each variable. So if we have three target classes, and 10 variables, we end up with 30 density functions. 10 density functions for each class. Finding these density functions is the training step.
Once we know these density functions we can use them in classification/regression step. For Naive Bayes we evaluate the density functions on each new observed data point and take the product of the resulting probabilities. Continuing from above, for every new data point to classify we would take its 10 variables and put each through the density functions associated with each of the three classes. The class that resulted in the greatest product of probabilities would be our predicted class.
Here, our training and evaluation are separate functions. The training function takes input data, creates density functions and packages them up into an evaluation function. The evaluation function, prepared with the density functions from the training step, simply evaluates the density functions on new data and takes their product.
Decision trees are another case where the training step is radically different than the evaluation step. The training step is an iterative process of considering potential splits based on optimizing some splitting metric: information gain (usually), Gini, etc. The evaluation step is the application of a series of checks (implemented using pattern matching or if-else chains).
It is evident that these two are vastly different. So different, in fact, that our original splitting metric does not even need to be considered in our evaluation function.
Anonymous / Lambda functions and Currying are powerful, convenient tools for the data scientist. They fit quite naturally within a functional programming framework and, as such, are well suited for the data-transformation work data scientists spend their time on. How and where to use these functions in your code depends a lot on what problems you are solving and---importantly---who you are solving them with. These strategies are not common in all circles. Make sure the people you are coding with will understand them before you start working them into your code.
The Python documentation discusses lambda functions as a control flow tool. Additionally, it is considered bad style to "bind" a lambda function to a varialbe (see PEP 8). This is easier to understand if you think about lambdas as "anonymous functions", which do should not have names (and therefore, not be callable by name).
The Haskell Wiki has a good explanation of the usefulness of anonymous functions that is applicable in any language. That most languages (incl. other data-science languages like R) allow anonymous functions is a good indicator of their utility.June 12, 2018