ShapeSecurity's Javascript VM: Part 2

Intro

In previous part, we talked about the VM Internals especifically about the VM Machinery. This second part we are going to talk about the VM Data and how they work together with the VM Machinery.

VM Data

The VM Data contains 11 data objects that are used as configuration for starting new threads, what op to operate on next, the bytecode, ops functions, etc.

You can think of the VM Data parts as the things that hold mutable andimmutable data, control-flow logic and the ops used by ShapeSecurity's VM. Once again the naming that I gave to these 11 variables do not reflect their actual descriptive meaning and the names were kept for legacy purposes.

1. XOR_MAP

Every single string that is constructed inside ShapeSecurity's VM is stored in this map. The strings are decrypted using a xor byte inside ShapeSecurity's VM. The way this works is that every time ShapeSecurity's VM needs to construct a string, they do it via xoring two strings from the array OBFUSCATED using the getXorValue() function.

The getXorValue function was replaced from this:

b1: {
		var K = G;
		var P = K + "," + E;
		var s = p[P];
		if (typeof s !== "undefined") {
				var y = s;
				break b1
		}
		var S = i[E];
		var w = qp(S);
		var X = qp(K);
		var c = w[0] + X[0] & 255;
		var J = "";
		for (var b = 1; b < w.length; ++b) {
				J += e(X[b] ^ w[b] ^ c)
		}
		var y = p[P] = J
}
var o = q.A.length;

into this:

  function getXorValue(xorMap, obfuscatedStrings, strA, strB) {
    const fullKeyName = strA + ',' + strB;
    let value = xorMap[fullKeyName];
    if (typeof value !== 'undefined') {
      return value;
    }
    const strBDecoded = base64Decoder(obfuscatedStrings[strB]);
    const strADecoded = base64Decoder(strA);
    const thirdXor = strBDecoded[0] + strADecoded[0] & 255;
    value = '';
    for (let i = 1; i < strBDecoded.length; ++i) {
      value += String.fromCharCode(strADecoded[i] ^ strBDecoded[i] ^ thirdXor);
    }
    xorMap[fullKeyName] = value;
    return value;
  }

Whenever getXorValue() is called inside ShapeSecurity's VM, the XOR_MAP is passed as the first parameter, OBFUSCATED(an array containing only strings) passed as second parameter, and strA and strB are passed as third and fourth parameter.

strA is the actual raw string from one of the items of OBFUSCATED and strB is not the actual raw string of another string inside OBFUSCATED but the index where it lies inside OBFUSCATED.

The fullKeyname is a joint hashed name that is derived from the strA + , + strB indexed number. This is done to avoid wasting time xoring the same string twice.

The xoring mechanism is relatively straight forward.

    1. Converts strA and strB to a base64 bytes.
    1. Create an array of base64 bytes named strADecoded from the string strA
    1. Create an array of base64 bytes named strBDecoded from the string strB
    1. Create third xor value, thirdXor, that is derived from adding the first byte of strADecoded, the first byte of strBDecoded and apply & 255 to the result.
    1. Create a string named value to stored the xoring text.
    1. Start from 1 and iterate through all the bytes from the array strBDecoded
    1. For each iteration, use the index to access each byte from strADecoded and strADecoded then xor them all together with the thirdXor value.
    1. Concatenate the result from the previous step into value
    1. Add the end of the loop, just add the final value into XOR_MAP using the fullKeyName. Avoiding double computation when xoring the same strA and strB together.

2. OPS_FUNCTIONS

The OPS_FUNCTIONS was originally an array of ops functions but in the pretty version it was converted to an object with each key corresponding to the index in the original array. It would have been too difficult to find each op in an array of hundreds of ops visually when working along ShapeSecurity's VM. For this reason alone it was converted into an object instead of an array.

The functions inside OPS_FUNCTIONS were cleaned up to apply a simple conversion of 1 statement(ReturnStatement, ExpressionStatement,IfStatement, etc.) represents 1 action. For the most part, all of the statements found inside each individual op contained 1 action per statement, with the exception of a few:

throwIfTypeError()

//ORIGINAL
if (!(c in I)) {
	throw new qd(c + " is not defined.")
}

//CONVERTED
function throwIfTypeError(_$A) {
	if (!(_$A in window)) {
		throw new ReferenceError(_$A + " is not defined.");
	}
}

throwIfIsNotAnObject()

//ORIGINAL
if (w.A[w.A.length - 1] === null || w.A[w.A.length - 1] === void 0) {
		throw new qJ(w.A[w.A.length - 1] + " is not an object")
}

//CONVERTED
function throwIfIsNotAnObject(_$A) {
	if (_$A === null || _$A === void 0) {
		throw new TypeError(_$A + " is not an object");
	}
}

forInFunc()

//ORIGINAL
var q = [];
for (var s in w.A[w.A.length - 1]) {
	f(q, s)
}

//CONVERTED
function forInFunc(stackValue) {
	var arr = [];
	for (var i in stackValue) {
		arr.push(i);
	}
	return arr;
}

pushWasExceptionHandled()()

//ORIGINAL
var q = w.M.N();
var s = {
		d: false,
		Q: w.f,
		r: w.r
};
w.x.q(s);
w.f = q.W;
w.r = q.r

//CONVERTED
function pushWasExceptionHandled(_vmContext) {
	var errors = _vmContext.errors.pop();
	var errorObj = {
		wasExceptionHandled: false,
		_errorOryIndex: _vmContext.yIndex,
		_xIndex: _vmContext.xIndex
	};
	_vmContext.errorTracker.push(errorObj);
	_vmContext.yIndex = errors._yIndex;
	_vmContext.xIndex = errors._xIndex;
}

errorTrackerPopWithThrow()

//ORIGINAL
var q = w.x.N();
if (q.d) {
		throw q.Q
}
w.f = q.Q;
w.r = q.r

//CONVERTED
function errorTrackerPopWithThrow(_vmContext) {
	var errorTrack = _vmContext.errorTracker.pop();
	if (errorTrack.wasExceptionHandled) {
		throw errorTrack._errorOryIndex;
	}
	_vmContext.yIndex = errorTrack._errorOryIndex;
	_vmContext.xIndex = errorTrack._xIndex;
}

The only argument that each op inside the array OPS_FUNCTIONS takes is an instance of a vmContext(). Each op function uses the vmContext() instance to do one or more of these actions:

    1. Access items from the HEAP, OBFUSCATED, ARRAY_OF_NUMBERS and/or ARRAY_OF_MAP_FILTER_FOR_EACH
    1. Increments the yIndex value by the number of uniquely referenced yIndex instances
    1. Modifies the stack array by adding and/or removing items from it
    1. Writes new keys into _vmMemory and/or reads keys from _vmMemory by using getKey() and setKey() methods
    1. Makes conditional jumps by setting yIndex and xIndex inside the consequent side of an IfStatement
    1. Creates new threads by calling createThread()
    1. Creates a new string using getXorValue()
    1. Starts a new "try catch mode" and/or ends a current "try catch mode"
    1. Throws an exception
    1. Defines a new value on an object and/or array by using Object.defineProperty
    1. Sets a new xyIndex from the current yIndex and xIndex values and/or sets yIndex to xyIndex.yIndex and xIndex to xyIndex.xIndex
    1. Ends the thread execution bysetting the returnValue to something else than explicitReturn
    1. Decreases the stack length

We can break these actions into three groups:

  • Actions 1 thru 2 are always done first
  • Actions 3 thru 12 are done second
  • Action 13 is done at the end

It is important to understand that not all ops will contains all these type of actions. Some ops will only contain actions from 3 thru 12, some might only contains actions 1 thru 2, and/or some might contain all of them in an op function.

Node: There are too many permutations of function ops to identify but there are a limited amount of lines that are used inside each function op.

In a later part, we will talk about ShapeSecurity's VM ops in more detail since they will require a full part to fully explained the inner workings.

3. OPS_SEQUENCE

ShapeSecurity VM's uses a dynamic mapping for determining two things:

  • The function op to execute based on the index value on OPS_FUNCTIONS
  • The xIndex value for the next op(unless is set inside the op by some action)

The OPS_SEQUENCE is a two dimensional array, you can think of the first dimension represents rows while the second dimension represents columns.

When the method next() on a vmContext instance is called it does 4 things:

    1. Uses the current xIndex value to access the first dimensional index on OPS_SEQUENCE
    1. Uses the yIndex value to access a byte(0-255) on the HEAP array as the second dimensional index
    1. Sets the next xIndex value to the first element in the returning array.
    1. Returns the second element in the returning array which corresponds to the index value on the OPS_FUNCTION array.
Object.defineProperty(L, "next", {
	value: function () {
		{
			var w = OPS_SEQUENCE[this.xIndex][HEAP[this.yIndex++]];
			this.xIndex = w[0];
			return w[1];
		}
	}
});

The xIndex value is not changed unless a specific action sets the xIndex inside the executed op to another value. This only occurs on specific actions that are used to set a conditional jump.

4. HEAP

In the previous part I kept calling the HEAP the bytecode. It took me a while to learn that what I was calling the HEAP truly represents the bytecode in a VM.

The HEAP is converted into an array of bytes(base64Decoder()) and in the previous versions of ShapeSecurity's VM the HEAP remained immutable. Later on, they introduced a new action into their function ops that allowed them to change any byte in the HEAP.

A lot of the configuration values that are used for setting a memory key, reading a memory key, getting the index of an item in OBFUSCATED, etc. come from the values in the HEAP. These are the values that are first read and set into a temporary variable(example:_$A) for later use.

5. explicitReturn and 6. returnValue

As previously mentioned in the previous part, the ShapeSecurity's VM runs until returnValue changes. In other words, when returnValue is set to explicitReturn the function vmRunner() continuously run until it sets to a different value than explicitReturn.

Inside the ops returnValue is changed to 3 different type of values:

    1. returnEmptyObject which is basically an empty object({})
    1. The last value in the stack array
    1. void 0 also known as undefined

The returnValue value acts as a sort of ReturnStatement from a regular function.

7. ARRAY_OF_NUMBERS and 8. OBFUSCATED

The ARRAY_OF_NUMBERS , as the name suggests, is simply an array that holds nothing but numbers in different formats. They are used for anything such as xoring bytes for encoding some values, encryption numbers, and numbers used for some of their signals.

OBFUSCATED should have really been called the ARRAY_OF_STRINGS since it complements ARRAY_OF_NUMBERS as it only contains an array of strings. However, when I first started labeling ShapeSecurity's VM internals I came across the getXorValue() function and noticed how this array was always used for decoding strings inside that function.

Nothing is deleted or added to these arrays as they both remain immutable the whole time.

9. NATIVE_FUNCTIONS

The NATIVE_FUNCTIONS is an array of multiple prototypes and functions that are used inside ShapeSecurity's VM as a way to reference native functions available as native code. ShapeSecurity's VM never adds any new items to this array and it remains immutable.

Inside one of the function ops, the NATIVE_FUNCTIONS is pushed to the stack array and this remains the only way this NATIVE_FUNCTIONS is accessed.

10. ARRAY_OF_MAP_FILTER_FOR_EACH

This array contains 3 prototypes:

    1. The prototype of Array.prototype.map
    1. The prototype of Array.prototype.filter
    1. The prototype of Array.prototype.forEach

The prototypes are later used to check if a current object contains those methods defined. If they don't then the compiled code inside ShapeSecurity's VM implements an alternative function. This is not evident until later going thru the dissambled code.

11. THREAD_CONFIG

Every time a new thread is created, using the createThread() inside one of the ops, the first parameter always corresponds to the index value of THREAD_CONFIG. Therefor, the array THREAD_CONFIG holds all the thread configurations used to set the keys for each newly created thread.

Each item in that array always contains an object with at-least 3 keys:

    1. keysFromArgs : This represents the keys in order that will be set for each item in arguments
    1. initializeKeys: All the keys that will be set to void 0 including the keys from keysFromArgs
    1. transferredKeys: The keys transferred from the parent's vmMemory() instance.

Additionally, two other keys are sometimes set

    1. arksKey: The key used to set arguments to.
    1. workResultKey: The key used to set itself, aka the this value.

With the exception of the first thread created, all future threads are created using the createThread() function with a value from the THREAD_CONFIG array as a configuring object.

Conclusion

We have barely scratched the surface in the inner details of ShapeSecurity's VM. In the next part we will dive into all the actions that make up the individual ops from OPS_FUNCTIONS and what kind of structures they produced. Make sure you tune in as there will be many more parts to come after Part 3 is done.