MongoDB using MapReduce to filter fields by text content

Let's say that we want to filter a list of documents and extract only those in which the text of one of the fields matches a given constraint, for example a word that is repeated in the text

For this example I will be doing it using MongoDB's MapReduce

Simple JSON Model
1
2
3
4
   {
      "_id" : "507f191e810c19729de860ea",
      "txtField" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus mauris arcu, lacinia a pharetra id, rhoncus in lorem. Morbi tempor consequat ante, vel porta tellus lacinia id. Phasellus vel nisi vitae velit pulvinar tincidunt non id massa. "
}

Below is the Map/Reduce functions that I will be applying to filter my results the results :

1.- Map Function


Map function
1
2
3
4
5
6
7
8
9
10
11
   function Map() {
  
     // count the number of times the regexp is matched in the text for the field txtField
     var count = this.txtField.match(/lorem/gi);
 
    // emit the result when the pattern has been matched at least 2 times
    if( count != null && count.length > 1){
       emit(this._id,count);
   }   
  
}

2.- Reduce Function


Reduce function
1
2
3
function Reduce(key, values) {
   return values[0];
}

3.- Full query


full query
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
db.runCommand({
     mapreduce: "MY_DOCS",
     map : function Map() {
          var count = this.txtField.match(/lorem/gi);
     
         if( count != null && count.length > 1){
           emit(this._id,count);
         }   
     
  
     },
    reduce : function Reduce(key, values) {
       return values[0];
    },
    query : { "txtField" : { "$exists" : true } },
    out : { inline : 1 }
});

You can find additional info on MapReduce operations this nice article

No comments:

Post a Comment

OSX show used ports or listening applications with their PID

On OSX you can display applications listening on a given port using the lsof the commands described below will show listening application...