How to use the map(), filter(), and reduce() functions in Julia

The map(), filter(), and reduce() functions

The map(), filter(), and reduce() functions are three fundamental higher-order functions that are found in almost every programming language out today. Python has these functions, as does Haskell (reduce() is called fold()), and so does Julia. We'll go through all three functions in one detailed example.

The big picture

There's a new startup called Date-A-Mine that plans to rival all the other updating apps and websites out there.

Suppose we are a fresh data-science hire at Date-A-Mine. We have been assigned to find out how many people there are, across the many different dating apps and websites, who are old and married (or, at least, claim so on their profiles).

We are given relevant data in the form of the following mutable type:

mutable struct DateAppSite
name::AbstractString # The name of the dating app/site
target::AbstractString # Does the company target young, middle-aged, or elderly users
total::Int # The total number of users
single::Int # The number of users whose profile state 'Single'
end
# Sample dating app data
print(DateAppSite("HotSpot", "young", 376, 89))

Take a look at the given data. We see the following:

dataset = [ DateAppSite("HotSpot", "young", 376, 89),
DateAppSite("Gro-ovy", "old", 187, 23),
DateAppSite("MidLife Dive", "middle", 567, 147),
DateAppSite("Olden Ring", "old", 972, 46),
DateAppSite("Second@Life", "old", 529, 342),
DateAppSite("Datteit", "young",1273, 219) ]
print(dataset)

We can finish our assigned task in one fell swoop with the filter(), map(), and reduce() functions, in that order. Let's make the function to do so.

The filter() function

First, we must narrow the dataset to only those apps and sites that target the elderly. To be more specific, in the dataset, we have to filter out those DateAppSite structs whose target fields don't have the string value of "old".

Note: We can edit all blocks of code from here on. Editing one block will have no affect on another block.

old_users = filter(x -> x.target === "old", dataset)
print("old_users: ", old_users)
print("\n\ndataset: ", dataset)

Code explanation

  1. We create a new variable called old_people. It will hold a subset of the original dataset.
  2. We use the filter() function that takes two arguments. In this case, the second argument, dataset, is an array of DateAppSite structs, and the first argument takes the condition to be used against each element of that array. We tell the function to go through all elements, denoted by x in each iteration, and look at whether their target field has the value of "old".
  3. We return a new array with the desired values. We look at the output and confirm that the function does not work in-place.

We can filter out any type of data we want. We can also pass in boolean-returning functions. Let's look at the code below:

function is_even(x)
return x % 2 == 0
end
total_below_six_hun = filter(x -> x.total < 600 && x.total > 200, dataset)
single_count_even = filter(y -> is_even(y.single), dataset)
print("dataset: ", dataset)
print("\n\nsingle_count_even: ", single_count_even)

The map() function

Next, we must compute the number of elderly and married users on each platform. Looking at the struct of DateAppSite, you can see that this can be easily done by subtracting the value of single from the value of total. These types of operations on arrays can be easily done with the map() function, as seen below.

old_and_married = map(x -> x.total - x.single, old_users)
print("old_and_married: ", old_and_married)

Code explanation

  1. We create a new variable called old_and_married. It will hold an array of values that correspond to the number of elderly and married users on each platform.
  2. We use the map() function that takes two arguments. In this case, the second argument, old_users, is an array of DateAppSite structs, and the first argument takes in the condition to be used against each element of that array. We tell the function to go through each element, denoted by x in each iteration, subtract the value of single from the value of total, and store it in its respective position in a new array.
  3. We return a new array with the desired values.

It should be noted that map() can also be used with multiple conditions and functions that return a value from inputs. Higher-order functions can, of course, be nested as well. Let's look at the following example:

print(map(z -> uppercasefirst(z), map(y -> lowercase(y.name), dataset)))

The reduce() function

Lastly, we need to reduce the list of values obtained from the function map() into a single value. That's the job of the reduce() function. Consider the following:

sum_old_and_married = reduce(+, old_and_married)
print("old_and_married: ", sum_old_and_married)

Code explanation

  1. Create a new variable called sum_old_and_married. It will hold the value that is the total number of elderly and married users across all platforms.
  2. Use the reduce() function that takes two arguments. In this case, the second argument, old_and_married, is an array of integers, and the first argument takes in the operator to be used on each element of that array. We tell the function to go through all values in old_and_married and sum them up. We could also supply an initial integer to start summing up from. Just add ;init = some_value before the end of the ending parenthesis. It would then look like this: reduce(+, old_and_married; init = 0).
  3. Return the single, desired value.

It should be noted that reduce() works left to right. If we want to apply the operation from the right, then we can use foldr(). Also, non-associative operators like - should be used with great consideration. We can confirm from the following code snippets that the outputs are wildly different:

# Will give (((2 - 4) - 6) - (-10)) = 2
print(reduce(-, [2, 4, 6, -10]), "\n")
# Will give (2 - (4 - (6 - (-10)) = 14
print(foldr(-, [2, 4, 6, -10]))

The final function

Let's look at the code below:

function find_total_married_and_old(data)
old_users = filter(x -> x.target === "old", data)
old_and_married = map(x -> x.total - x.married, old_users)
sum_old_and_married = reduce(+, old_and_married)
return sum_old_and_married
end
print(find_total_married_and_old(dataset), "\n")
# All in one line
single_liner = reduce(+, map(y -> y.total - y.married, filter(x -> x.target === "old", dataset)))
print(single_liner, "\n")
# Using mapreduce()
shorter_liner = mapreduce(x -> x.total - x.married, +, filter(x -> x.target === "old", dataset))
print(shorter_liner)

Our final function, find_total_married_and_old(), is a culmination of all that you've learned so far (the exact same lines, actually); we just return the final integer value. This can also be performed in a single, non-reader-friendly line as well, as seen in the highlighted line. Lastly, mapreduce() can be used to perform the task of an individual map() and reduce() together.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved