map()
, filter()
, and reduce()
functionsThe map()
, filter()
, and reduce()
functions are three fundamental higher-order functions that are found in almost every programming language out today. Python has these functions, as does Haskell (reduce()
is called fold()
), and so does Julia. We'll go through all three functions in one detailed example.
There's a new startup called Date-A-Mine that plans to rival all the other updating apps and websites out there.
Suppose we are a fresh data-science hire at Date-A-Mine. We have been assigned to find out how many people there are, across the many different dating apps and websites, who are old and married (or, at least, claim so on their profiles).
We are given relevant data in the form of the following mutable type:
mutable struct DateAppSitename::AbstractString # The name of the dating app/sitetarget::AbstractString # Does the company target young, middle-aged, or elderly userstotal::Int # The total number of userssingle::Int # The number of users whose profile state 'Single'end# Sample dating app dataprint(DateAppSite("HotSpot", "young", 376, 89))
Take a look at the given data. We see the following:
dataset = [ DateAppSite("HotSpot", "young", 376, 89),DateAppSite("Gro-ovy", "old", 187, 23),DateAppSite("MidLife Dive", "middle", 567, 147),DateAppSite("Olden Ring", "old", 972, 46),DateAppSite("Second@Life", "old", 529, 342),DateAppSite("Datteit", "young",1273, 219) ]print(dataset)
We can finish our assigned task in one fell swoop with the filter()
, map()
, and reduce()
functions, in that order. Let's make the function to do so.
filter()
functionFirst, we must narrow the dataset to only those apps and sites that target the elderly. To be more specific, in the dataset
, we have to filter out those DateAppSite
structs whose target
fields don't have the string value of "old"
.
Note: We can edit all blocks of code from here on. Editing one block will have no affect on another block.
old_users = filter(x -> x.target === "old", dataset)print("old_users: ", old_users)print("\n\ndataset: ", dataset)
old_people
. It will hold a subset of the original dataset
.filter()
function that takes two arguments. In this case, the second argument, dataset
, is an array of DateAppSite
structs, and the first argument takes the condition to be used against each element of that array. We tell the function to go through all elements, denoted by x
in each iteration, and look at whether their target
field has the value of "old"
.We can filter out any type of data we want. We can also pass in boolean-returning functions. Let's look at the code below:
function is_even(x)return x % 2 == 0endtotal_below_six_hun = filter(x -> x.total < 600 && x.total > 200, dataset)single_count_even = filter(y -> is_even(y.single), dataset)print("dataset: ", dataset)print("\n\nsingle_count_even: ", single_count_even)
map()
functionNext, we must compute the number of elderly and married users on each platform. Looking at the struct of DateAppSite
, you can see that this can be easily done by subtracting the value of single
from the value of total
. These types of operations on arrays can be easily done with the map()
function, as seen below.
old_and_married = map(x -> x.total - x.single, old_users)print("old_and_married: ", old_and_married)
old_and_married
. It will hold an array of values that correspond to the number of elderly and married users on each platform.map()
function that takes two arguments. In this case, the second argument, old_users
, is an array of DateAppSite
structs, and the first argument takes in the condition to be used against each element of that array. We tell the function to go through each element, denoted by x
in each iteration, subtract the value of single
from the value of total
, and store it in its respective position in a new array.It should be noted that map()
can also be used with multiple conditions and functions that return a value from inputs. Higher-order functions can, of course, be nested as well. Let's look at the following example:
print(map(z -> uppercasefirst(z), map(y -> lowercase(y.name), dataset)))
reduce()
functionLastly, we need to reduce the list of values obtained from the function map()
into a single value. That's the job of the reduce()
function. Consider the following:
sum_old_and_married = reduce(+, old_and_married)print("old_and_married: ", sum_old_and_married)
sum_old_and_married
. It will hold the value that is the total number of elderly and married users across all platforms.reduce()
function that takes two arguments. In this case, the second argument, old_and_married
, is an array of integers, and the first argument takes in the operator to be used on each element of that array. We tell the function to go through all values in old_and_married
and sum them up. We could also supply an initial integer to start summing up from. Just add ;init = some_value
before the end of the ending parenthesis. It would then look like this: reduce(+, old_and_married; init = 0)
.It should be noted that reduce()
works left to right. If we want to apply the operation from the right, then we can use foldr()
. Also, non-associative operators like -
should be used with great consideration. We can confirm from the following code snippets that the outputs are wildly different:
# Will give (((2 - 4) - 6) - (-10)) = 2print(reduce(-, [2, 4, 6, -10]), "\n")# Will give (2 - (4 - (6 - (-10)) = 14print(foldr(-, [2, 4, 6, -10]))
Let's look at the code below:
function find_total_married_and_old(data)old_users = filter(x -> x.target === "old", data)old_and_married = map(x -> x.total - x.married, old_users)sum_old_and_married = reduce(+, old_and_married)return sum_old_and_marriedendprint(find_total_married_and_old(dataset), "\n")# All in one linesingle_liner = reduce(+, map(y -> y.total - y.married, filter(x -> x.target === "old", dataset)))print(single_liner, "\n")# Using mapreduce()shorter_liner = mapreduce(x -> x.total - x.married, +, filter(x -> x.target === "old", dataset))print(shorter_liner)
Our final function, find_total_married_and_old()
, is a culmination of all that you've learned so far (the exact same lines, actually); we just return the final integer value. This can also be performed in a single, non-reader-friendly line as well, as seen in the highlighted line. Lastly, mapreduce()
can be used to perform the task of an individual map()
and reduce()
together.
Free Resources