Home > Hadrian > Analyzing a PFA engine

## Analyzing a PFA engine

After loading a PFA file, Hadrian provides a list of `com.opendatagroup.hadrian.jvmcompiler.PFAEngine` instances. Because of intentional restrictions in the PFA language, a lot of useful information is available about the algorithm it represents without running it.

The Hadrian Basic Use page describes how to create a `PFAEngine`, execute it, take snapshots and revert its state, and look at or convert its Abstract Syntax Tree (AST). This page explains how to statically analyze the `PFAEngine` without executing it.

### Call graph

A call graph is a directed graph in which functions are nodes and if function A calls function B, there is a directed edge from A to B. This graph can be used to determine if a PFA file calls any functions that are deemed unsafe or otherwise undesirable, if they’re recursive, if the scoring engine has mutable state, etc.

The call graph is stored in `PFAEngine.callGraph`, which is a `Map[String, Set[String]]` from a string-based function name to the set of all functions it calls. Special forms and methods like `begin`, `action`, and `end` are given in parentheses.

`PFAEngine` has several convenience functions for common applications:

• `calledBy`: determine which functions call a given function (reverse lookup, opposite of the `callGraph`’s `apply` method).
• `callDepth`: determine how many levels deep this function calls other functions, terminating on library functions and special forms. The return type is `Double` so that recursive functions can be expressed as `Infinity`.
• `isRecursive`: determine if this function directly calls itself.
• `hasRecursive`: determine if this function has any recursive loops (`isRecursive` implies `hasRecursive`, but not the other way around).
• `hasSideEffects`: determine if a function has `"(cell-to)"` or `"(pool-to)"` anywhere in its subgraph. These are the only ways to modify the persistent state of a PFA engine.

## Mutability check

To determine if `engine`, an instance of `PFAEngine`, is mutable, call

``````engine.hasSideEffects("(action)")
``````

Although Hadrian has transparent, built-in support for concurrency within a JVM process, scoring engines running in different processes or on different computers cannot take advantage of this mechanism. An execution environment may want to forbid or handle mutable PFA differently from immutable PFA.

## Analyzing cells and pools

All of the persistent state of a PFA scoring engine (mutable or immutable) is contained in its cells (single, global variables) and pools (R-like environments, containing global variables that can be created or destroyed at runtime). If the scoring engine is mutable, its changing state can be monitored without creating snapshots of its entire state.

• `analyzeCell` takes a cell name and an analysis, an arbitrary Scala function, and returns the result of that function when applied to the given cell.
• `analyzePool` takes a pool name and an analysis, an arbitrary Scala function, and returns a `Map[String, X]` of that function’s results when applied to each element of the pool.

## Memory use

Hadrian can also track memory use of a running PFA engine. General Java tools exist for tracking the total memory used by a JVM or the memory used by specific classes, but neither of these provide granularity at the level of a PFA engine, which consists of many classes and not the whole JVM.

`com.opendatagroup.hadrian.memory.EngineReport` provides an analysis of the memory used by one `PFAEngine`’s persistent state, and `com.opendatagroup.hadrian.memory.EnginesReport` provides an analysis of memory used by a collection of `PFAEngine` instances that may share memory.

Here is how to use them on `engine` (an instance of `PFAEngine`) and `engines` (a Scala collection of `PFAEngine` instances):

``````val report1 = EngineReport(engine)
println(report1.total)
println(report)

val report2 = EnginesReport(engines)
println(report2.sharedCells)
println(report2)
``````

`EngineReport` and `EnginesReport` are both immutable case classes with meaningful fields and a convenient `toString` method for generating a tabular report.

The analysis is based on counting existing data elements and multiplying by appropriate factors for each type of data element (including Java’s object padding, distinctions between boxed and unboxed primitives, etc.). For large collections (arrays and maps), approximations are used, since the underlying data is represented as a tree of unknown balancing. This method does not account for structural sharing of immutable data and is therefore an overestimate. It also does not count data that is beyond scope but not yet garbage collected, so it is a more stable measure than total heap space used.

Some data types are fast to compute, such as an array of integers, since one only needs to know the size of an integer and the number of elements in the array. Others are slow, such as an array of strings, since one must traverse the array and add memory usage of each variable-length string.