What is Map-reduce ?

Map-reduce is a programming model that helps to do operations on big data in parallel to achieve faster results. To understand map reduce go through this article which has a nice explanation for beginners.

MongoDB supports map-reduce to operate on huge data sets to get the desired results in much faster way. So, map reduce has two main functions which is a **map** function which groups all the data based on the **key** value (go through the article mentioned above to understand what is **key.**) and a **reduce** function which performs operation on the mapped data. So, the data is independently mapped and reduced in different shards and then combined together again by map function and finally reduced to give a single result. Map-reduce function is performed on the data independently and in parallel. So, you should be very careful with your reduce function so that it can perform operation independently.

Lets look at an example and solve the problem using map-reduce. For simplicity lets take the data mentioned in the **article** mentioned above.

Here is the problem statement. There is list of cities with temperature, the goal is to find the maximum temperature for each city. This can be easily done using** mongodb aggregation framework. **But lets solve this problem using **map-reduce** now and look at the advantages later.

Lets insert some data.

db.**cities**.insert({city: 'Toronto', temperature: 20})
db.**cities**.insert({city: 'Whitby', temperature: 25})
db.**cities**.insert({city: 'New York', temperature: 22})
db.**cities**.insert({city: 'Rome', temperature: 32})
db.**cities**.insert({city: 'Toronto', temperature: 4})
db.**cities**.insert({city: 'Rome', temperature: 33})
db.**cities**.insert({city: 'New York', temperature: 18})
db.**cities**.insert({city: 'New York', temperature: 14})

Now that the data is inserted we can perform map reduce on that. The map reduce query looks like this.

db.collectionName.mapReduce(mappingFunction, reduceFunction,
{out:'outputCollectionName'});

As informed before we need two function i.e **mapper** function and **reduce** function. Mongodb can interpret java-script and you must write the functions in JavaScript.

**Now Lets look at the mapping function.**

**Mapper Function**

function()
{
emit(this.city, this.temperature) //Emits the city and temperature
};

The above function runs for each and every document in the collection where you run the map reduce, in our case it is **cities** collection. For every document it **emits** **city** **as key** and **temparature** as values. The mapping function basically emits a **key** and a **value** pair **.** In our case it is

** **

*“New York” => 22*

*“New York” => 18*

*“New York” => 14*

*“Toronto” => 20*

*“Toronto” => 4*

*“Rome” => 32*

*“Rome” => 33*

**The data i.e is emitted by the mapper function is grouped and passed to reduce function to operate on the values. The grouped data looks like this.**

*New York” => [22, 18, 14]*

*“Toronto” => [20, 4]*

*“Rome” => [32, 33]*

**Reduce Function: **

function(key, values) {
return Math.max.apply(Math, values); // javascript syntax to find the max values in an array.
}

So the reduce function takes two parameters the** key and the grouped values** that are produced by mapping function , performs an operation and returns a single value.

In our case the **reduce function has to find the maximum temperature** for each city. So the reduce function runs on the grouped data.

So lets see how it works for our example.

Our reduce function takes **‘New York’ and [22,18,14]** as parameters. It performs operations on the values and returns the maximum value in the array. In this case it is **22.**

Similarly, for *“Toronto” => [20, 4] ,* *the maximum value is 20* and that is returned from the reduce function.

P.S : The above explanation makes you understand how the mapper and reduce function works. But, internally the mapper function and reduce functions are called repeatedly and not just once for every key and values. That is **emit** function may just emit just 2 values for **New York** in the beginning. .i.e **[ 18, 14 ]** and then **reduce** function reduces and gives the maximum value **18** which is again called by mapper function and when it encounters another document with the same key i.e **22** , it groups** 18 and 22 together [18,22].** Again passed to reduce function and you get **22** as the result. which is the final value. **So, by breaking the operations still you achieved the same result and with better performance**. The data can be split and operated on independently in many threads or in many machines and achieve much faster results.

So, now that you have mapper function and reduce function lets run the map reduce command and check the results. Mongodb map-reduce command will output the result to a new collection rather than printing it to the console. So, you need to specify the output collection for it to dump the results. In my example I am dumping it to a **collection **called** maxTemp.**

So our final query looks like this.

db.test.mapReduce( function() { emit(this.city, this.temperature)},
function(key, values) {return Math.max.apply(Math, values)},
{out:'maxTemp'});

Now running this command should have give us the results and we will be able to see the results in **maxTemp **collection.

Lets take the look at the result now.

> db.maxTemp.find()
Result :
{ "_id" : "New York", "value" : 22 }
{ "_id" : "Rome", "value" : 33 }
{ "_id" : "Toronto", "value" : 20 }
{ "_id" : "Whitby", "value" : 25 }

Finally we have the **maximum temperature** calculated for **every city** .

**When to use map-reduce** ?

Map reduce should be used when your aggregation query is slow and taking longer time to execute because of huge amount of data in the DB. Map-reduce can run parallel and can perform operations at much higher rate.

If the data is less its better to stick to aggregate queries as map reduce takes longer exexcution times compared to aggregate queries when data set is low. And the effort required is more compared to aggregate queries.

Your map-reduce function should be written in such a way that it can run parallel with i.e map and reduce and still give the correct result.

You can check mongodb docs for more options and use it in your map reduce query. Here is the **link** for the same.