identifying contributors to large file size and slow run time

Tim_IEA · Post by **Tim_IEA** » Thu Apr 28, 2022 10:21 am

Hi All,

Thanks for past help and thanks in advance for future help!

I'm looking to improve the run time of my model and reduce the output file size. I would like to know if there are any tools or means to identify which variables make the biggest contribution to slowing the model down and/or to the output file size.

There are usual suspects I'm addressing (reducing the number of subscripts in variables, vector select instead of sum used as sumproduct) but it would be good to have a more targeted approach across the thousands of variables in the model.

Any other advice is appreciated!
Thanks very much,
Tim

Post by **Administrator** » Thu Apr 28, 2022 2:29 pm

There is a post on Tom's blog about how to optimize models, these steps can really speed up model execution.
https://metasd.com/2011/01/optimizing-vensim-models/

For saving, all you can really do is use a savelist and play with saveper.

Post by **tomfid** » Fri Apr 29, 2022 8:00 pm

I probably need to update that post.

Compiled simulation is definitely the most reliable big win. The mdl.bat installed with Vensim now handles most versions of Visual Studio without editing, so it's easy.

File size is strictly proportional to variable count, which tends to be driven mostly by subscript complexity. It's also directly influenced by SAVEPER, so if you can get away with a longer period, you can save a lot of space.

I find VECTOR SELECT gains to be spotty.

One thing that's really expensive is GET DATA functions. These should be wrapped in an Initial() if at all possible.

sarahboyar@gmail.com · Mon Mar 10, 2025 12:16 pm

1.1
So, data loaded through
data import >>
then
loaded as data.vdf for a data input type variable
is not expensive?

______
1.2
What if there are multiple data inputs? is it better to have data1.vdf, data2.vdf, ....dataN.vdf imported via a Data Model and run these in a separately in this model designed only to import data variables, and then load the datavariables.vdf from that model into the Main Model?

*I currently have a Data Model which loads my population data and then inputs the .vdf from this model into all my Main Models.
This model has variables like population[1yr ageband,***,region], population[age, ***], population[region], etc. so that it's all ready to go for different applications or client problems.

**I also try and do vector select operations in Data Model, mapping population[1yr ageband,***,region] to population[5 year ageband, ***, region] and then only use the array required in a given model.

______
1.3
Presumably if you load the datavariables.vdf from the Data Model you wouldn't want that model to have a number of other data variables not relevant to the Main model.

Is a .vdf with only one variable faster to run than a .vdf with many variable, not all used in the model? E.g. Main Model has population data, but population.vdf has variables for births, deaths, total_population. Do the births and deaths variables in the .vdf slow down Main Model and it's better to load a .vdf with only the variables required?

______
1.4
In what situation would I wish to use a .CIN file for constants vs. just have different .vdfs loaded? E.g. I could create .cin files:

scenario1.cin
scenario_selector_variable = 1

scenario2.cin
scenario_selector_variable = 2

Or, I could run these scenarios in a sub-model and then just load the results as a scenarioN.vdf, especially if the scenarios have a number of different variables and variable outcomes within them....
Maybe this is a different question about model structures and running modules in different .mdl files....
Fundamentally, for this post, are .cin files computationally heavier that .vdf inputs for data variables?
(a) For constants?
(b) I guess one advantage of a .vdf input is that you could change from scenario1 to scenario2 at time=x whereas .cin files only load constants, right?

_____
1.5
Can you use Initial with subscripts? Is the only way to do this to write a separate equation for each element?

_____
1.6
Similar question to 1.1 and 1.2 above:

a) Why would I export data as data.dat instead of create data.vdf to use in my main model?

b) Is there a setting I can elect where the simulation.vdf file only saves a specified list of variables? This feels more efficient than exporting a list of variables and then importing to another model.

1.7
______
Am I missing something about running 2 models at once, communicating with one another? Was this a feature in newer versions of Vensim? Possibly I need to provide a different question about nested models and file structure and running modules/simulations of part of a model at a time....

Thank you,
Sarah

Thank you,
Sarah

sarahboyar@gmail.com · Mon Mar 10, 2025 3:04 pm

Maybe a bottom line question: if I have a model with a lot of summed arrays, in many dimensions 2x106x9x5 etc. would it be better to sum the variables in a separate model file which simulates the sums as outputs? Given that for the most part the sums aren't used as inputs, and VECTOR SELECT doesn't seem (according to Tom) to offer much in terms of computational savings?

Makes model analysis as you go a little bit more difficult.... if you want to see aggregate output just to get a sense of things.

Actually, this particular model isn't that large, so I am wondering why it's so slow! It was made several years ago, maybe there's some relics I'm not seeing and I ought to check out the text editor.....

sarahboyar@gmail.com · Mon Mar 10, 2025 3:26 pm

And do you know if there's any value in putting an arrayed initial stock value as a .cin instead of data?
Again my data is imported from a text file and loaded as a vdf. Maybe I would be better off importing .dat? Does this still need to be loaded as a vdf? I couldn't work out how to do this for some reason (import from formats apart from text tab delimited)

Post by **Administrator** » Mon Mar 10, 2025 3:49 pm

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm So, data loaded through
data import >>
is not expensive?

Correct.

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm What if there are multiple data inputs? is it better to have data1.vdf, data2.vdf, ....dataN.vdf imported via a Data Model and run these in a separately in this model designed only to import data variables, and then load the datavariables.vdf from that model into the Main Model?

*I currently have a Data Model which loads my population data and then inputs the .vdf from this model into all my Main Models.
This model has variables like population[1yr ageband,***,region], population[age, ***], population[region], etc. so that it's all ready to go for different applications or client problems.

**I also try and do vector select operations in Data Model, mapping population[1yr ageband,***,region] to population[5 year ageband, ***, region] and then only use the array required in a given model.

Difficult to say without seeing the model.

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm Presumably if you load the datavariables.vdf from the Data Model you wouldn't want that model to have a number of other data variables not relevant to the Main model.

Is a .vdf with only one variable faster to run than a .vdf with many variable, not all used in the model? E.g. Main Model has population data, but population.vdf has variables for births, deaths, total_population. Do the births and deaths variables in the .vdf slow down Main Model and it's better to load a .vdf with only the variables required?

It's fine to have different variables in the VDFX file.

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm In what situation would I wish to use a .CIN file for constants vs. just have different .vdfs loaded? E.g. I could create .cin files:

Purely up to you. Both CIN and data are fast, so it's really a question of how easy/difficult it is to keep track of multiple files.

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm Can you use Initial with subscripts? Is the only way to do this to write a separate equation for each element?

Yes you can use it with subscripts.

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm a) Why would I export data as data.dat instead of create data.vdf to use in my main model?

You would not need to bother doing this.

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm b) Is there a setting I can elect where the simulation.vdf file only saves a specified list of variables? This feels more efficient than exporting a list of variables and then importing to another model.

Savelists have been in Vensim for many years.

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm Am I missing something about running 2 models at once, communicating with one another?

No. This is not currently possible.

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 12:16 pm Was this a feature in newer versions of Vensim? Possibly I need to provide a different question about nested models and file structure and running modules/simulations of part of a model at a time....

You can have sub-models in the latest version of Vensim. Essentially it allows you to construct a model from a number of existing models.

Post by **Administrator** » Mon Mar 10, 2025 3:52 pm

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 3:04 pm Maybe a bottom line question: if I have a model with a lot of summed arrays, in many dimensions 2x106x9x5 etc. would it be better to sum the variables in a separate model file which simulates the sums as outputs? Given that for the most part the sums aren't used as inputs, and VECTOR SELECT doesn't seem (according to Tom) to offer much in terms of computational savings?

Makes model analysis as you go a little bit more difficult.... if you want to see aggregate output just to get a sense of things.

Actually, this particular model isn't that large, so I am wondering why it's so slow! It was made several years ago, maybe there's some relics I'm not seeing and I ought to check out the text editor.....

If you are summing arrays, make sure you only do the sum once, and calculate the sum in a separate variable. For example,
replace
region proportion[region] = allocation[region] / sum(allocation[region!] )
with
total allocation = sum(allocation[region!] )
region proportion[region] = allocation[region] / total allocation

And if summing constants,

total initial allocation = initial(sum(initial allocation[region!] ))

Post by **Administrator** » Mon Mar 10, 2025 3:52 pm

sarahboyar@gmail.com wrote: ↑Mon Mar 10, 2025 3:26 pm And do you know if there's any value in putting an arrayed initial stock value as a .cin instead of data?
Again my data is imported from a text file and loaded as a vdf. Maybe I would be better off importing .dat? Does this still need to be loaded as a vdf? I couldn't work out how to do this for some reason (import from formats apart from text tab delimited)

No benefit in using a CIN rather than data. Do whatever is easiest.

sarahboyar@gmail.com · Sun Mar 23, 2025 8:12 pm

Thank you for all of the responses. I cleared up a lot in my file structure, most of which was housekeeping and consistency, rather than operations to make the model run more efficiently. I did clear all the summing arrays to calculate in separate variables as to your comment above and the notes from Tom's MetaSD page on slow models. It was very helpful to post here to work all that through.

Ultimately, the model was still very slow. And I worked out why, and I had this same problem about 10 years ago. It's not obvious, so I'll post:
I had a stock with a large number elements in several subscript variables, e.g. 106 age elements, 3800 geographies, gender, etc.
As I was building the model I put the initial value of the stock at 1...planning to go back later and import the dataset for initial values. This is what caused the excessively slow sim speed. Changing this to an initial value from imported data solved the slowness problem. This has happened several times to me, one of those little things when you're building and take a shortcut just to get to simulation faster. Anyway, for what it's worth. Maybe there's a simpler way to state this for the rest of the forum.

Thank you,
Sarah

Post by **tomfid** » Mon Mar 31, 2025 4:00 pm

Administrator wrote: ↑Mon Mar 10, 2025 3:52 pm No benefit in using a CIN rather than data. Do whatever is easiest.

I'm not convinced that's true. A data equation has a little overhead at every time step. A constant does not. I only use data for things that actually have time variation.

Post by **tomfid** » Mon Mar 31, 2025 4:04 pm

sarahboyar@gmail.com wrote: ↑Sun Mar 23, 2025 8:12 pm Ultimately, the model was still very slow. And I worked out why, and I had this same problem about 10 years ago. It's not obvious, so I'll post:
I had a stock with a large number elements in several subscript variables, e.g. 106 age elements, 3800 geographies, gender, etc.
As I was building the model I put the initial value of the stock at 1...planning to go back later and import the dataset for initial values. This is what caused the excessively slow sim speed. Changing this to an initial value from imported data solved the slowness problem. This has happened several times to me, one of those little things when you're building and take a shortcut just to get to simulation faster. Anyway, for what it's worth. Maybe there's a simpler way to state this for the rest of the forum.

This is kind of puzzling, because a stock initial value equation shouldn't have much impact - even if it has a lot of elements, it only executes once.

The worst speed-hog I'm aware of is using the GET DATA MIN/MAX/... functions without wrapping them in an INITIAL.

Travis · Post by **Travis** » Mon Mar 31, 2025 8:40 pm

Tom – Could you post an example of a GET DATA request that is and is not wrapped by INITIAL? Maybe just screenshots the two equation editor windows showing the difference? I think the model I'm working on might benefit from this.

Thanks in advance.

Post by **Administrator** » Tue Apr 01, 2025 8:09 am

I think Tom is referring to GET DATA MIN/GET DATA MAX (https://www.vensim.com/documentation/fn ... T+DATA+MIN).

The GET DATA XLS (etc) functions execute before the rest of the model calculations, and only executes once. The data structures are then populated and used by the simulation/sens/optimization.

Post by **tomfid** » Tue Apr 01, 2025 2:19 pm

Right - the key (in the screenshot above) is that the Equation type is set to Initial, which which really just wraps the right side of the equation in INITIAL, i.e.

var = INITIAL( GET DATA MIN( blabla ) )

INITIAL is restricted, such that it has to be the first thing on the right side of the equation. This is sometimes inconvenient, so I often use a macro:

:MACRO: INIT( x )
INIT = INITIAL(x)~x~|
:END OF MACRO:

Then you can write things like

var range = INIT( GET DATA MAX( blabla ) ) - INIT( GET DATA MIN( blabla ) )

Travis · Post by **Travis** » Tue Apr 01, 2025 6:36 pm

Great. Thank you, both, for the clarifications.

Post by **tomfid** » Wed Apr 02, 2025 2:55 pm

For file size, the most likely candidate is heavily-subscripted variables. We don't have a way to identify these built in, but here's a trick:
- export a run to a tab or csv file, with subscripts in their own columns
- to minimize the file size, you can set the time range to capture just the first 2 time steps
- load the result into Excel
- create a pivot table, with
Filter = 2nd time step column (exclude all the blanks, which are constants)
Row = variable name
Target variable = count of 1st time step column
- sort by descending count
You can now see which variables have the most instances, and therefore occupy the most space in your vdfx. Exclude any that aren't needed with a savelist (.lst).

Ventana software support forum

identifying contributors to large file size and slow run time

identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time

Re: identifying contributors to large file size and slow run time