The simulator core provides simulation configuration and startup, the parallel model of computation and a common interface to the technology models.
The SST is configured in one of two ways: with an XML file, or through a custom programmatic generator. Both methods describe the components instantiated in the simulation, any component parameters that must be passed in, the links between the components, and the latency on the component links. This configuration is processed as a graph, with the component instances as nodes, and the links between them as edges. The graph is then partitioned for multi-rank runs using either one of the built-in partitioning libraries or a custom partitioner found in an element library.
The simulation is carried out in a component-based discrete event model of computation. Each component can assign a clock to itself, to be triggered at regular intervals. Components can also send events to other components along links, which have a per-set latency. When an event arrives at a component, it triggers an event handler function, in which the component can process and respond to the event. Alternately, the component can poll the link to receive and process any outstanding messages.
Parallelism is transparent to the component writer. Components interact by sending events to each other through link objects. All events inherit from a common base class, which also includes a time tag to indicate when it should be delivered. All events must be serializable (using the Boost Serialization Library), which can transform the event structure into a compact binary representation.
Whenever an event is sent, the SST core determines if the destination of the event is local (i.e. on the same MPI rank) or remote. Remote events are queued up for future delivery the next time the given ranks are due to synchronize. This occurs only as often as needed, based upon the latency of links that cross patitions, i.e. if the components on two ranks are connected by a link with a minimum latency of 1000 ns, those ranks only need synchronize every 1000ns of simualted time.
The Boost MPI library is then used to perform the actual communication. When two ranks synchronize, they each serialize and send the pending events to each other. When the events are received, they are integrated with the local event queues, where they wait for delivery to their target components.
Time in the simulator is represented using a single 64-bit unsigned integer to count the number of atomic timesteps that have passed since the beginning of simulation. The actual atomic timebase (time increment represented by each atomic timestep) is user programmable and has a default of 1 fs (10-12seconds), which provides for over 200 days of simulated time. All times used by components and links are specified using strings (for example, "1.5 ns" or "1.73 GHz"), and are resolved at build time into a TimeConverter object. The TimeConverter object essentially represents a component's view of time and provides functions for converting from the component's timebase to the atomic timebase. The TimeConverter simply stores the number of atomic timesteps (referred to as its factor) in the desired time interval. In the case of a specified clock frequency, the factor represents the number of atomic timesteps in the clock period. For example, a component with a 1 GHz clock would get a TimeConverter object with a factor of 1000 (assuming the default atomic timebase of 1 fs), which would also be equal to the factor for 1 ns.
The component has two options when creating a TimeConverter. The first is to register a clock handler, in which case the handler is called once per clock period. The second is to simply register a timebase with the simulator, which can be used with the event driven interface. In either case the returned TimeConverter object is registered with the component's links, where it is used to convert latencies from the component's view of time to the atomic timebase. The use of TimeConverters insulates the components from both the need to know the value of the atomic timestep, as well as from knowing their own operating frequency. This allows a component be written with a generic timebase, which can be set at runtime.