Enhancing a Package: Linking Models
Follow these steps to use pipelines to link models in a Package.
In the previous article, we added iterations to our Package to represent uncertainty in our model. In this article, we will cover the use of pipelines to perform multiple sequential calculations, such that the output of one Transformer becomes the input of the next Transformer.
These enhancements will be part of a new Package, called helloworldPipeline, within the helloworldEnhanced suite of Packages. This Package builds off the previous Package in the suite, helloworldUncertainty. Follow these steps to set up this new Package:
- Create a new subfolder within your working folder (e.g. c:\temp\helloworldPipeline)
- If creating a model in R, copy the package.xml and model.R files from the helloworldUncertainty subfolder into the new helloworldPipeline subfolder
- If creating a model in Python, copy the package.xml and model.py files from the helloworldUncertainty subfolder into the new helloworldPipeline subfolder.
Note
The final materials for this article can be downloaded using the following links:
- R
- Python
Step 1 - Add Pipelines to the XML Configuration File
Note
The SyncroSim XML Package has the following structure:
- XML version and Package name
- Primary Transformer and Chart Transformer
- Run Control
- First Model Inputs
- First Model Outputs / Second Model Inputs
- Second Model Ouputs
- First Transformer and Second Transformer
- Layouts (Results Transformer, Library Datafeeds, Scenario Datafeeds, Charts)
Transformers
The first step in using pipelines is to specify all of the Transformers in the pipeline within the primary Transformer. We do this using the <include> tag. We will name the first Transformer in our Pipeline firstModel, and the second Transformer in our Pipeline secondModel. See below for the updated <include> tag in our package.xml:
<!--Chart Transformer-->
<include>
<transformer name="corestime_Runtime"/>
<transformer name="firstModel"/>
<transformer name="secondModel"/>
</include>
After the primary Transformer, we also need to specify which datafeeds and model scripts will be used in each of the Transformers in the Pipeline. The first Transformer, named firstModel will run the original model.R script, so we must specify this in the Transformer attributes. Add the <pipeline> tag as a child attribute of this Transformer, then add all the datafeeds that are needed in the model.R script beneath this tag. We specify whether each datafeed is an input or output of the transformer using the type attribute.
<!--First Transformer-->
<transformer
name="firstModel"
displayName="First Model"
isRunnable="True"
programName="Rscript"
programArguments="model.R"
configurationSheet="RunControl">
<pipeline>
<datafeed name="RunControl" type="Input"/>
<datafeed name="InputDatafeed" type="Input"/>
<datafeed name="IntermediateDatafeed" type="Output"/>
</pipeline>
</transformer>
Underneath the first Transformer, add the second Transformer, named secondModel. This Transformer will have a similar structure to the first, but instead of running model.R, this Transformer will use a new model script called model2.R. The IntermediateDatafeed that was output by the first Transformer is now the input of this second Transformer. OutputDatafeed is the output of the second transformer.
<!--Second Transformer-->
<transformer
name="secondModel"
displayName="Second Model"
isRunnable="True"
programName="Rscript"
programArguments="model2.R"
configurationSheet="RunControl">
<pipeline>
<datafeed name="RunControl" type="Input"/>
<datafeed name="IntermediateDatafeed" type="Input"/>
<datafeed name="OutputDatafeed" type="Output"/>
</pipeline>
</transformer>
Datafeeds and Datasheets
We now need to add an output datafeed for the second Transformer. The InputDatafeed will remain the same and our original OutputDatafeed will become our IntermediateDatafeed. The original OutputDatafeed will be renamed to IntermediateDatafeed since this datafeed will be the output of the first Transformer, but the input of the second Transformer. Make sure to rename all the attributes and child elements to IntermediateDatasheet. Copy the IntermediateDatafeed and paste it directly underneath. Rename this datafeed OutputDatafeed. Our second model will be calculating the cumulative sum of the y value, so replace the y column with a new column called yCum in the OutputDatasheet. Below is the IntermediateDatafeed and OutputDatafeed for our package.xml:
<!--First Model Outputs / Second Model Inputs-->
<datafeed
name="IntermediateDatafeed"
displayName="Intermediate Outputs"
dataScope="Scenario">
<datasheets>
<datasheet name="IntermediateDatasheet">
<columns>
<column
name="IntermediateDatasheetID"
dataType="Integer"
isPrimary="True"/>
<column name="ScenarioID" dataType="Integer"/>
<column name="Iteration" dataType="Integer"/>
<column name="Timestep" dataType="Integer" displayName="Timestep"/>
<column name="y" dataType="Double" displayName="Value for y"/>
</columns>
</datasheet>
</datasheets>
</datafeed>
<!--Second Model Outputs-->
<datafeed name="OutputDatafeed" displayName="Outputs" dataScope="Scenario">
<datasheets>
<datasheet name="OutputDatasheet">
<columns>
<column name="OutputDatasheetID" dataType="Integer" isPrimary="True"/>
<column name="ScenarioID" dataType="Integer"/>
<column name="Iteration" dataType="Integer"/>
<column name="Timestep" dataType="Integer" displayName="Timestep"/>
<column name="yCum" dataType="Double" displayName="Cumulative y"/>
</columns>
</datasheet>
</datasheets>
</datafeed>
Layouts
To specify the order of transformers in a Pipeline within the SyncroSim Windows Interface, we will add a Pipeline configuration tab to the Scenario Datafeeds Layout. Under the Scenario Datafeeds Layout, remove the RunControl item and replace it with a group with name="RunControl". Add two items to this group, one that will contain the original RunControl information (for configuring timesteps and iterations) and one with the built-in SyncroSim Core Pipeline information. Make sure to also add the IntermediateDatafeed as another item in this layout.
<!--Scenario Datafeeds Layout-->
<layout name="coreforms_ScenarioDatafeeds">
<group name="RunControl" displayName="Run Control">
<item name="RunControl" displayName="General"/>
<item name="core_Pipeline"/>
</group>
<item name="InputDatafeed"/>
<item name="IntermediateDatafeed"/>
<item name="OutputDatafeed"/>
</layout>
Finally, if we want to view the results from both the first and second Transformer, we need to add a second item to the Charts layout and change the name of the datasheet containing the y column to IntermediateDatasheet. The y output should now point to the IntermediateDatasheet, and a new item called yCum should point to the OutputDatasheet.
<!--Charts Layout-->
<layout name="corestimeforms_Charts" configurationSheet="RunControl">
<item name="y" displayName="y" dataSheet="IntermediateDatasheet" column="y"/>
<item
name="yCum"
displayName="Cumulative y"
dataSheet="OutputDatasheet"
column="yCum"/>
</layout>
Step 2 - Modify the Model Scripts
In model.R the calculations will stay the same, we just need to modify the variable names such that we are saving data to the IntermediateDatafeed rather than the output datafeed. Simply modify the following two lines in the original model.R file:
# Setup empty R dataframe ready to accept output in SyncroSim datasheet format
myOutputDataframe <- datasheet(
myScenario,
name = "helloworldEnhanced_IntermediateDatasheet"
)
# Save this R dataframe back to the SyncroSim library's output datasheet
saveDatasheet(myScenario,
data = myOutputDataframe,
name = "helloworldEnhanced_IntermediateDatasheet")
Create a new R script called model2.R. This script will perform the calculations in the second Transformer. The purpose of this script is to read the y value from the intermediateDatasheet, and then calculate the cumulative sum of y for each timestep and iteration.
Here is the new model2.R script:
library(rsyncrosim) # Load SyncroSim R package
myScenario <- scenario() # Get the SyncroSim scenario that is currently running
# Load RunControl datasheet to be able to set timesteps
runSettings <- datasheet(myScenario, name = "helloworldEnhanced_RunControl")
# Set timesteps - can set to different frequencies if desired
timesteps <- seq(runSettings$MinimumTimestep, runSettings$MaximumTimestep)
# Load scenario's input datasheet from SyncroSim library into R dataframe
myInputDataframe <- datasheet(myScenario,
name = "helloworldEnhanced_IntermediateDatasheet")
# Setup empty R dataframe ready to accept output in SyncroSim datasheet format
myOutputDataframe <- datasheet(myScenario,
name = "helloworldEnhanced_OutputDatasheet")
# For loop through iterations
for (iter in runSettings$MinimumIteration:runSettings$MaximumIteration) {
# Only load y for this iteration
y <- myInputDataframe$y[myInputDataframe$Iteration == iter]
# Do calculations
yCum <- cumsum(y)
# Store the relevant outputs in a temporary dataframe
tempDataframe <- data.frame(Iteration = iter,
Timestep = timesteps,
yCum = yCum)
# Copy output into this R dataframe
myOutputDataframe <- addRow(myOutputDataframe, tempDataframe)
}
# Save this R dataframe back to the SyncroSim library's output datasheet
saveDatasheet(myScenario,
data = myOutputDataframe,
name = "helloworldEnhanced_OutputDatasheet")
Step 3 - Rebuild the Package
Use the Package Manager to rebuild the Package using the new package.xml and model.R or model.py files shown above (see Step 4 in the Building a Package article).
Step 4 - Set Model Inputs
- Start SyncroSim.
- Create a new library based on the helloworldEnhanced Package.
- Within the Library Properties, set the location of the R or Python executable in the R Configuration or Python Configuration tab.
- Create a new Scenario - Right-click in the Library Explorer and select New > Scenario from the context menu. Rename this scenario to Pipeline.
- Edit the Scenario Run Control - General - Within the Pipeline Scenario, navigate to the Run Control tab. There should now be two pages on this tab: General and Pipeline. On the General page, set the Number of Iterations, Minimum Timestep, and Maximum Timestep values for your model.
- Edit the Scenario Run Control - Pipeline - Now, on the Pipeline page, add the Transformers to the Stage section and specify the Run Order.
- Edit the Scenario Inputs - Choose values for mMean, mSD, and b.
- Save the Scenario inputs - Save changes made to the Scenario inputs.
Step 5 - Run the Model
Right-click on this Pipeline Scenario in the Library Explorer and select Run to run this Scenario.
Step 6 - View the Results
- Once the run is complete, return to the Library Explorer. Expand the node beside the Pipeline scenario to reveal a Results folder containing your results, then expand the node beside the Results folder to show the newly generated date/time stamped Results Scenario.
- Right-click on this Results Scenario and select Properties to view the details of this Results Scenario; you will find the results of the calculations from the first Transformer under the Intermediate Outputs tab. You will also find a second output tab called Outputs that contains the results of the calculation carried out by the second Transformer.
- To view the results from each Transformer in the pipeline, select the Create a new chart button in the Results Viewer. In the left-hand column, there is the option to use y and/or Cumulative y. Select both of these options and click Apply to view both outputs. See the Customizing a Chart tutorial for instructions on how to make further modifications to your maps, such as modifying the X and Y axes.