Best practices in creating performant formulas in Polaris.
2.02-01 Polaris - Understanding Calculation Complexity
Polaris achieves calculation efficiency by minimizing the number of cells that the engine has to do work for. Some formulas can be calculated very efficiently by only doing work for 'populated' cells. The Calculation Complexity column will tell the modeler how the engine is interpreting a formula and how much work will be required for each populated source cell.
There are three types of Calculation Complexity: One-to-One, One-to-Many, and All Cells.
One-to-One
This is the most efficient type of formula. A One-to-One formula is a formula where the engine can ‘drive’ the calculation by only iterating over the populated cells in one or more of the source line items. In the example of “Revenue = Units * Price”, it's possible to drive this calculation in several ways:
- You could calculate every target cell in Revenue.
- You could multiply each non-zero value in Units to get the result, since the formula is a straight multiplication and anything multiplied by zero is zero.
- You could also do a multiplication of every item in Price, but this would be less efficient.
In this case, the Polaris engine chooses the most efficient way to calculate the result. It takes only three real calculations instead of 15. This is key to efficiently calculating at high dimensionality.
One-to-Many
One-to-Many formulas require a multiple of work for every populated source cell. This is displayed in the Calculation Complexity column as One-to-Many(x) where x is the 'Fan-out factor' or the number of calculations that has to be done for every source cell.
Having a formula with One-To-Many(x) can be a sign that data is being spread out over a dimension.
In this example, a simple reference between two line items where the target has a finer granularity than the source will have a Calculation Complexity of One-to-Many(3). Note that not all cells are having to be calculated (or populated) here, but a multiple of the number of source cells are having work done.
All Cells
All Cells is the least efficient when modeling with significant dimensionality. This is the simplest to explain, in that the only mathematically valid way for Polaris to calculate the result of the formula is to do a calculation for every target cell. This is how Classic calculates line items in almost all cases. This becomes very inefficient with significant dimensionality due to the sheer number of cells.
2.02-02 Polaris: Avoid formulas with high Fan-Out One-to-Many values
Having a large Fan-Out in the Complexity Column can lead to performance issues with line items with significant dimensionality.
relatedToRule
2.02-03 Polaris: Avoid All Cells
Having All Cells in the Complexity Column can lead to performance issues with line items having significant dimensionality.
relatedToRule
2.02-04 Polaris: Boolean is roughly the same size in Polaris as other formats
Unlike Classic, using Booleans instead of Number format isn't a big benefit because Boolean are about the same size as other formats.
2.02-05 Polaris: Early Exits in Polaris don’t matter as much as they do in Classic
In general, Polaris will interpret the meaning of the formula and attempt to identify the most efficient way to drive the calculation. The order of cases in an IF clause doesn't matter. For example, IF X THEN Y ELSE Z is exactly equivalent to IF NOT X THEN Z ELSE Y.
2.02-06 Polaris: Create logic to render the results in the Default value
Polaris only uses memory when storing populated (non-default) values. This is why it's important to ensure that the logic of a model maintains default values (blank, false, and zero) as often as possible. For example, if the common scenario is for a value to be TRUE, consider reversing the logic and making the common result FALSE which is the default value.
2.02-07 Polaris: Avoid creating logic that results in dense modules with significant dimensionality
Using constants and item(list) will create a 100% dense module as every cell will be populated.
2.02-08 Polaris: Avoid using intensive functions at any significant dimensionality
A few functions are inherently computationally intensive. Used in significant dimensionality, they can be very slow to calculate. Examples of these are:
- ISFIRSTOCCURRENCE()
- RANK()
- CUMULATE()
2.02-09 Polaris: Avoid using SUM/LOOKUP combination
While SUM/LOOKUP combination should be avoided in Classic, it's paramount to avoid using it in Polaris. The net result of a SUM and LOOKUP in the same formula is that the engine has to do a very large number of calculations for every target cell in a line item. Although the logic will take up a smaller amount of space, it's MUCH more efficient to split the LOOKUP into a different line item and then do the SUMs separately.
2.02-10 Polaris: Avoid unnecessary LOOKUPs
LOOKUPs in Polaris are computationally quite intensive, especially with significant dimensionality. The engine must iterate over every item in the dimension being LOOKED UP over. Consider making LOOKUPs their own line item. This might make the system work better by reducing how many times a LOOKUP needs to be calculated.