MongoDB using MapReduce to filter fields by text content

Let's say that we want to filter a list of documents and extract only those in which the text of one of the fields matches a given constraint, for example a word that is repeated in the text

For this example I will be doing it using MongoDB's MapReduce


   {
      "_id" : "507f191e810c19729de860ea",
      "txtField" : "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus mauris arcu, lacinia a pharetra id, rhoncus in lorem. Morbi tempor consequat ante, vel porta tellus lacinia id. Phasellus vel nisi vitae velit pulvinar tincidunt non id massa. "
}


Below is the Map/Reduce functions that I will be applying to filter my results the results :

1.- Map Function


   
   function Map() {
 
     // count the number of times the regexp is matched in the text for the field txtField
     var count = this.txtField.match(/lorem/gi);

    // emit the result when the pattern has been matched at least 2 times
    if( count != null && count.length > 1){ 
       emit(this._id,count);
   }    
 
}

2.- Reduce Function



   function Reduce(key, values) { 
      return values[0];
   }

3.- Full query


  db.runCommand({ 
       mapreduce: "MY_DOCS", 
       map : function Map() {
            var count = this.txtField.match(/lorem/gi);
    
           if( count != null && count.length > 1){ 
             emit(this._id,count);
           }    
    
 
       },
      reduce : function Reduce(key, values) {
         return values[0];
      },
      query : { "txtField" : { "$exists" : true } },
      out : { inline : 1 }
  });

You can find additional info on MapReduce operations this nice article

No comments:

Post a Comment