How to use the map(), filter(), and reduce() functions in Julia
The map(), filter(), and reduce() functions
The map(), filter(), and reduce() functions are three fundamental higher-order functions that are found in almost every programming language out today. Python has these functions, as does Haskell (reduce() is called fold()), and so does Julia. We'll go through all three functions in one detailed example.
The big picture
There's a new startup called Date-A-Mine that plans to rival all the other updating apps and websites out there.
Suppose we are a fresh data-science hire at Date-A-Mine. We have been assigned to find out how many people there are, across the many different dating apps and websites, who are old and married (or, at least, claim so on their profiles).
We are given relevant data in the form of the following mutable type:
mutable struct DateAppSitename::AbstractString # The name of the dating app/sitetarget::AbstractString # Does the company target young, middle-aged, or elderly userstotal::Int # The total number of userssingle::Int # The number of users whose profile state 'Single'end# Sample dating app dataprint(DateAppSite("HotSpot", "young", 376, 89))
Take a look at the given data. We see the following:
dataset = [ DateAppSite("HotSpot", "young", 376, 89),DateAppSite("Gro-ovy", "old", 187, 23),DateAppSite("MidLife Dive", "middle", 567, 147),DateAppSite("Olden Ring", "old", 972, 46),DateAppSite("Second@Life", "old", 529, 342),DateAppSite("Datteit", "young",1273, 219) ]print(dataset)
We can finish our assigned task in one fell swoop with the filter(), map(), and reduce() functions, in that order. Let's make the function to do so.
The filter() function
First, we must narrow the dataset to only those apps and sites that target the elderly. To be more specific, in the dataset, we have to filter out those DateAppSite structs whose target fields don't have the string value of "old".
Note: We can edit all blocks of code from here on. Editing one block will have no affect on another block.
old_users = filter(x -> x.target === "old", dataset)print("old_users: ", old_users)print("\n\ndataset: ", dataset)
Code explanation
- We create a new variable called
old_people. It will hold a subset of the originaldataset. - We use the
filter()function that takes two arguments. In this case, the second argument,dataset, is an array ofDateAppSitestructs, and the first argument takes the condition to be used against each element of that array. We tell the function to go through all elements, denoted byxin each iteration, and look at whether theirtargetfield has the value of"old". - We return a new array with the desired values. We look at the output and confirm that the function does not work in-place.
We can filter out any type of data we want. We can also pass in boolean-returning functions. Let's look at the code below:
function is_even(x)return x % 2 == 0endtotal_below_six_hun = filter(x -> x.total < 600 && x.total > 200, dataset)single_count_even = filter(y -> is_even(y.single), dataset)print("dataset: ", dataset)print("\n\nsingle_count_even: ", single_count_even)
The map() function
Next, we must compute the number of elderly and married users on each platform. Looking at the struct of DateAppSite, you can see that this can be easily done by subtracting the value of single from the value of total. These types of operations on arrays can be easily done with the map() function, as seen below.
old_and_married = map(x -> x.total - x.single, old_users)print("old_and_married: ", old_and_married)
Code explanation
- We create a new variable called
old_and_married. It will hold an array of values that correspond to the number of elderly and married users on each platform. - We use the
map()function that takes two arguments. In this case, the second argument,old_users, is an array ofDateAppSitestructs, and the first argument takes in the condition to be used against each element of that array. We tell the function to go through each element, denoted byxin each iteration, subtract the value ofsinglefrom the value oftotal, and store it in its respective position in a new array. - We return a new array with the desired values.
It should be noted that map() can also be used with multiple conditions and functions that return a value from inputs. Higher-order functions can, of course, be nested as well. Let's look at the following example:
print(map(z -> uppercasefirst(z), map(y -> lowercase(y.name), dataset)))
The reduce() function
Lastly, we need to reduce the list of values obtained from the function map() into a single value. That's the job of the reduce() function. Consider the following:
sum_old_and_married = reduce(+, old_and_married)print("old_and_married: ", sum_old_and_married)
Code explanation
- Create a new variable called
sum_old_and_married. It will hold the value that is the total number of elderly and married users across all platforms. - Use the
reduce()function that takes two arguments. In this case, the second argument,old_and_married, is an array of integers, and the first argument takes in the operator to be used on each element of that array. We tell the function to go through all values inold_and_marriedand sum them up. We could also supply an initial integer to start summing up from. Just add;init = some_valuebefore the end of the ending parenthesis. It would then look like this:reduce(+, old_and_married; init = 0). - Return the single, desired value.
It should be noted that reduce() works left to right. If we want to apply the operation from the right, then we can use foldr(). Also, non-associative operators like - should be used with great consideration. We can confirm from the following code snippets that the outputs are wildly different:
# Will give (((2 - 4) - 6) - (-10)) = 2print(reduce(-, [2, 4, 6, -10]), "\n")# Will give (2 - (4 - (6 - (-10)) = 14print(foldr(-, [2, 4, 6, -10]))
The final function
Let's look at the code below:
function find_total_married_and_old(data)old_users = filter(x -> x.target === "old", data)old_and_married = map(x -> x.total - x.married, old_users)sum_old_and_married = reduce(+, old_and_married)return sum_old_and_marriedendprint(find_total_married_and_old(dataset), "\n")# All in one linesingle_liner = reduce(+, map(y -> y.total - y.married, filter(x -> x.target === "old", dataset)))print(single_liner, "\n")# Using mapreduce()shorter_liner = mapreduce(x -> x.total - x.married, +, filter(x -> x.target === "old", dataset))print(shorter_liner)
Our final function, find_total_married_and_old(), is a culmination of all that you've learned so far (the exact same lines, actually); we just return the final integer value. This can also be performed in a single, non-reader-friendly line as well, as seen in the highlighted line. Lastly, mapreduce() can be used to perform the task of an individual map() and reduce() together.
Free Resources