Combining SD and statistics

RutgerMooy · Post by **RutgerMooy** » Wed Oct 06, 2004 12:31 pm

(this was also posted to the SD mailing list)

Dear all,

In a project dealing with the effects of (among other things) ICT services on the interconnectiveness of society, we've come to the point at which we would like to quantify an SD-model. To do this, we hope to use statistical techniques. However, my knowledge of statistics is very basic and I'm struggling with the process of quantification.

I'd appreciate it if anyone could help me out with this! Details of the issue are below.

Thanks!

Rutger Mooy
TNO Telecom
the Netherlands

----------------------------------------------

BACKGROUND
We have available a fairly large data set with survey results across a population. These data are non-longitudinal, that is, they do not follow specific individuals over time. Survey questions range from (for example) "do you have a mobile phone?" to "are you satisfied with the amount of communication you have with your close friends?".
We hope to structure this information into a model that can help us deal with policy questions. The focus of the model would be the mutual interaction between Information and Communication Technologies (ICTs) and certain Social Processes and phenomena (also known as Social Capital).

APPROACH
From theory, literature, expert opinion and intuition, we have come up with a qualitative model that maps out a number of variables relating to each other, showing some very interesting feedback structures. To explore this further, we'd like to use statistical techniques, probably regression and path analysis, to locate the direction and strength of the various effects we'd expect to be present. We then plan to feed these quantified effects into the model, which can then be used in simulation runs.

TIME
One of the main issues is that it's very hard to specify the time component in the modeled processes. That is to say, we cannot validate the speed at which things are happening, and we will have to make assumptions about that.

COMBINING STATISTICS AND SD
Let's say we have questioned the individuals in the population on variables X and Y, and that we (from theory, expert opinion, intuition, etc.) hypothesize that X is actually causing Y. We can run a statistical regression analysis estimating the effect of X on Y, all other variables held constant. We would be, in fact, estimating the coefficients a and b in the formula: Y = a + b*X.
The main question is: how do we put those coefficients into an appropriate formula in the SD Model? How does one 'translate' the statistical output into input for the system dynamics model? This is the process that I'm most unsure about. Additional questions may be: do we have enough information to be able to estimate the relationships in the model? Can we include the concept of change-over-time working with this data?

Any help is appreciated!

LAUJJL · Post by **LAUJJL** » Sun Oct 10, 2004 4:51 pm

Hi Rutger.

I am not sure at all to be of any help, but trying does not cost anything only my time.

Let’s reformulate your problem as I have understood it. Maybe I am wrong.

You have built so far a qualitative model that explains the cause, effects and feed back of your problem.

Some of the effects are not deterministic and were determined by regression. If the variables are determined by path
Analysis it will be more complicated to formulate, path analysis being already a sort of statistical modelling.

In an SD model, generally the equations are deterministic, which means that one value as input determines exactly one result as output.

But in your case, a value gives a stochastic output, which means that you know the mean of the result, eventually the
Variance, and eventually the kind of distribution.

One way to deal with variability is to use sensibility analysis, making constants vary. But is does not interest you because it is not the input that vary but the function and then the result.

One solution to the problem would be to represent the function completely.

By example: if the relation is a simple linear regression you will write Y = A * X + B + Epsilon.

You can consider that A and B are constant and that epsilon is stochastic has a mean of zero, is Gaussian if there are enough observations.

In fact A is stochastic too, and should be too evaluated using statistical functions.

So instead of having a simple arrow linking two variables, you will have many variables with a lot of arrows making your model much more complicated. Not to mention the case of multiple regression or non linear regression, which is still more complicated.

It is then necessary to consider the utility of such calculus. Are you looking for insights or for hypothetical right results?

Working only with simple deterministic functions, can make sense, especially in a first exploratory model.

A second point to consider is closely related to the fist one.

You are trying to go from a qualitative model to a quantitative model, having already sketched the structure of the model in the qualitative diagram.

I have already tried this method, which I learned in a distant course last year. The method is called the standard method. I tried to use it during 6 months, but got so bad results that I nearly stopped using SD. After a while I decided
to start again by using the method that is explained in the Vensim User’s guide and in Business dynamics page 81 under the paragraph “Principles for successful use of SD”. It says : get a preliminary model working as soon of possible. Don’t try to develop a comprehensive conceptual model prior to the development of a simulation model etc…

It is clearly said: conceptual mapping of the whole problem is of little use.

I think that conceptual mapping can be useful to give an overall idea of the problem, and can be used to build a quantitative model, if the modeller has years of experience. Otherwise it is dangerous, because you have already built
the structure of the model, without having made any comparison of the model to the reality. There are no reference models in qualitative models, no calibration, no sensitivity analysis, no reality check. You are left alone with you intuition, experts opinions and so on, which can be completely wrong.

In my case, I started my problem, with no loops at all, working in a static way, and from that built progressively the first part of the model. That part has only deterministic functions, needs no regression analysis, has no soft or intangible variables that cannot be measured. I have tried to start avoiding all the main difficulties.
Most of all the way I see the problem is now completely different that I saw it with the first qualitative diagram mapping. It has about 30 loops with only three stocks, with 4 really important loops.
I worked through about 40 successive models little step by little steps, always checking the results with reality.

I will put next in the second stage, the variables that are less deterministic (in my case, mainly the price elasticity of different products obtained by observation of data and using some statistical methods.) But I have already a first model, that solves the problem of the supply chain which is in my case the problem of investments (I rent trucks).

Other ideas about that subject would be interesting especially from more experienced modellers.

Hope that it may help a bit.

Jean-Jacques Laublé

Ventana software support forum

Combining SD and statistics

Combining SD and statistics

reply to SD and statistics