OP map-reduce (v0.3)

From Native Big Data Documentation
Jump to: navigation, search


As a Big Data oriented language, op have map and reduce functions oriented to operate big objects over a network.

Map

Map funcions allow execute a function on every object inside a hash or array. This function can be executed in local and proxy objects.

The syntax is:

obj.map( function, <options> )

where:

  • obj is either a hash or an array. The output will be of the same type.
  • function is a function of the form (v,k) => { ... } . The function must return a list with the values to add to the output array or a value to the output hash
  • options is a optional parameter that says how to implement the map. The options object has the format: { "thr" : number_of_threads , "servers": number_of_servers || [ list of servers ] } .
    • "thr"  : number_of_threads . The number of threads. One by default. Multithreding is only allowed with splitted objects.
    • "servers" : number_of_servers . The number of servers where the work will be distributed. One by default. Multiserver is only allowed with splitted objects with replicas.
    • "servers" : [list of servers] . The servers where the work will be distributed. Local server by default. Multiserver is only allowed with splitted objects with replicas.


Example 1: The following script will create a local array and return other with the values doubled:

var a = [1,2,3,4,5,6]; 
a.map( (v,k) => { [ v*2 ] } )

Example 2: The following script will create a local array and return other with the values doubled. The map will use 3 threads.

var a = new ( "urp:obj:/tt" , [1,2,3,4,5,6] ,2 ); 
a.map( (v,k) => { [ v*2 ] } , { "thr": 3 }  )

WARNING: The options object is ignored in NBD v 0.3 and will be available in version 0.4

Example 3: With hashess is similar:

var a = { "v1":1 , "v2":2 , "v3":3 };
a.map( (v,k) => { 2*v } )


Example 4: A shorter way to do maps storing the result in the input object is:

var a = [1,2,3,4,5,6]; 
a [ (v) => { 2*v } ]

or with hashes Example 5:

var a = { "v1":1 , "v2":2 , "v3":3 };
a[(v,k) => { 2*v }]


In the case of arrays as inputs, the map function must return an array of values. This is useful if we want to create more than one ouput for every input: Example 6: The following script will create a local array and return other with the input values and the input values doubled:

	var a = [1,2,3,4,5,6]; 
	a.map( (v,k) => { [ v, v*2 ] )

This works too with proxy objects: Example 7:

	var a = i2o("urp:obj:/test_1"); 
	a.map( (v,k) => { [ v, v*2 ] )

Reduce

A reduce transformation is provided to agregate values with the same key and generate a hash of the reduced output. Reduce functions acts as map but agregate values by keys. Is similar with the "group by" SQL command.

The easiest way to see this is with an example:

var a = [ 
	{ "id": "a" , "v":      1 } ,  
	{ "id": "b" , "v":      2 } ,  
	{ "id": "c" , "v":      3 } ,  
	{ "id": "a" , "v":      4 } ,  
	{ "id": "c" , "v":      5 } ,  
	{ "id": "b" , "v":      6 } 
];
a.reduce( ( a,b ) => { a+b } , (v,k) => { [ [ 1, v.v, v.v*v.v ] , v.id ] } )

obj.reduce has the following parameters:

  • agregator function: says how agregate two agregations. This function must work with element values or agregated values
  • getter function . This function have as parameters the value and the key of each element and must return the value and the key of the values to be agregated.

In the following example we will use a native agregator. Native agregator are agregators implemented in C and have a better performance.

var a = [ 
	{ "id": "a" , "v":      1 } ,  
	{ "id": "b" , "v":      2 } ,  
	{ "id": "c" , "v":      3 } ,  
	{ "id": "a" , "v":      4 } ,  
	{ "id": "c" , "v":      5 } ,  
	{ "id": "b" , "v":      6 } 
];
var ag = Stat.new();
a.reduce( ag , (v,k) => { [ [ 1, v ] , v.id ] } )

The native agregator we are using is the Stat agregator. This agregator calculates means and variances on every category.

Reduce functions can be used too with distributed objects:

new("urp:obj:/test3", 
[ 
	{ "id": "a" , "v":      1 } ,  
	{ "id": "b" , "v":      2 } ,  
	{ "id": "c" , "v":      3 } ,  
	{ "id": "a" , "v":      4 } ,  
	{ "id": "c" , "v":      5 } ,  
	{ "id": "b" , "v":      6 } 
] , 3, null, 1);
var t = i2o("urp:obj:/test3");
var ag = Stat.new();
a.reduce( ag , (v,k) => { [ [ 1, v ] , v.id ] } )


The following example shows how to implement a complex agregator:

var a = [ 
	{ "id": "a" , "v":      1 } ,  
	{ "id": "b" , "v":      2 } ,  
	{ "id": "c" , "v":      3 } ,  
	{ "id": "a" , "v":      4 } ,  
	{ "id": "c" , "v":      5 } ,  
	{ "id": "b" , "v":      6 } 
];
a.reduce( ( a,b ) => { var n = a[0] + b[0]; [ n , a[0]*a[1] + b[0]*b[1]/n ]  } , (v,k) => { [ [ 1, v.v ] , v.id ] } )

In this example, note that the getter function will return [ number of elements agregated , value agregated ] . And the agregator function can agregate elements or agregations.