QUERY Ensuring Quality of Models

Richard G. Dudley <richard.du · Fri Nov 09, 2007 7:12 am

Posted by Richard G. Dudley <richard.dudley@attglobal.net>

In his Next Fifty Years Article in the most recent System Dynamics Review,
Forrester comments: ""We need to begin debating how to raise quality and scope in
applications, published papers, and especially in academic programs.""

What I am wondering is:

What procedures do people use to ensure that the model formulation, testing,
evaluations cycle is done in a rigorous manner? How could that be improved?

Models developed for clients may end up being tested in the real world. Perhaps
that is the ultimate test. But how do consultants make sure that a model is
ready for presentation? What internal controls are used to test, and approve a
model, or parts thereof, prior to going ahead to the client, or to the next
modeling step. Are specific internal or external evaluation processes used or
is the convincing of the client a sufficiently rigorous exercise. Is coming up
with a 'bad' model risky for future business?

Presumably quality of published models should be checked by those refereeing the
journal article. Is that right... are the models actually examined? My
impression is that examination of the models themselves is very variable...
reviewing a model takes a lot more time than reviewing a paper.

I am just wondering what procedures consulting firms or institutions actually
use in reviewing their own models.

Richard
Posted by Richard G. Dudley <richard.dudley@attglobal.net>
posting date Fri, 9 Nov 2007 11:27:45 +0700
_______________________________________________

Richard Stevenson <rstevenson · Sat Nov 10, 2007 7:25 am

Posted by Richard Stevenson <rstevenson@valculus.com>

Some good points, Richard.

There are of course, many approaches to technical model validation.
In fact, it's almost a subject in its own right.

My own (maverick, possibly) view, is that models are only valid if
they actually change behaviours. For consultants, this is far more
about developing good client relationships and interactions than about
'technical' model validation. If, as a consultant, you can't
technically validate your own models, then you are not a consultant at
all - a charlatan, even? And if your client needs an external third
party to validate your model - then you've failed anyway.

Rather, developing client confidence in a model is more a 'relationship'
issue. Good consultants succeed by using models if they understand that
the model is not an end in itself but, rather, a communication tool with
the client - to be developed, debated and ultimately validated through
client interaction and informed 'buy in'. As soon as a model becomes a
'black box' to the client, then as a consultant you've lost the plot.

This is really where I have significant differences with some SD
academics and many 'consultants'. In the main, I can confidently say
that 90% of academics have never conducted anything more than 'superficial'
client work - they feel good if they can get a client to sponsor an
academic project. And too many 'consultants' are actually doing bad
client work - not because they can't build clever technical models but
because they really don't understand what it takes to 'make a difference'.

What is a ""difference""? That's the 'a posterior' perspective.
Ultimately, the ONLY fundamental criterion of SD model validity is ""what
did it change, in what direction, and by how much?"" This is, of course,
an 'a posterior' view. And that's the problem with all planning tools -
you can't 'sell' the value of the model in advance. If only we could ""sell""
SD on an 'a priori' basis!!

I'd be rich, for a start, Our website (www.valculus.com) includes a number
of short case examples of SD model projects that changed things, a lot (in
retrospect!).

regards
Richard

Richard Stevenson
Valculus Ltd
Posted by Richard Stevenson <rstevenson@valculus.com>
posting date Fri, 9 Nov 2007 17:23:41 +0000
_______________________________________________

""nickols@att.net"" <nickols@ · Sat Nov 10, 2007 7:25 am

Posted by ""nickols@att.net"" <nickols@att.net>

Richard Dudley, prompted by Forrester's article ""The Next Fifty Years,""
comments about ways and means of ensuring the quality of models used.
That's probably a good idea but I can envision a snag or two. While
head of strategic planning and management services at ETS, I tried a
number of times to interest senior managers and execs in system
dynamics. One of them involved some real SD modeling. I did this on
my own and I referred the SD practitioner (Jack Homer) to some others.
Jack did what I thought was good work for us (including clearing up
some wrongheaded thinking on our part about how to use SD). Here's
the snag: I doubt those for whom the work was done would want their
models exposed to public scrutiny. For one thing, they no doubt view
them as proprietary - as works done for hire. For another, assuming
the models did indeed provide value, they wouldn't want that made freely
available to competitors. For a third, if the models were a ""bust"" which,
in this case they weren't, why put a misstep on public display?

As for how we ourselves evaluated such models, I can speak only for myself.
""Goodness of fit"" would seem to be the single most important measure. By
that, I mean two things. First, does the model as conceived seem a good
fit with the variables and relationships making up the system being modeled?
Second, does the behavior of the model produce a good fit with the data
generated by the real system? (When there is no system, as in a situation
in which you are designing a system to behave a certain way, ""goodness of
fit"" becomes a less tenable measure.)

That said, I'm sure there are some arenas where the models developed could
be publicly reviewed, analyzed and critiqued. Perhaps others have some
ideas on that score.

Regards,

--
Fred Nickols
Posted by ""nickols@att.net"" <nickols@att.net>
posting date Fri, 09 Nov 2007 13:14:33 +0000
_______________________________________________

<richard.dudley@attglobal.net · Sun Nov 11, 2007 7:35 am

Posted by <richard.dudley@attglobal.net>

Perhaps my comment was not clearly stated. I was not really thinking about
public discussion of models, but internal (within an institution or firm)
pre-review of models... Sort of a quality control prior to moving to the next
step. The next step may be going ahead with a publication, or taking the
current version of the model to the client for review.

That is, I am curious as to how groups of model builders ensure that the model
they are building are both ""good models"" and are approved for further action,
follow-up, release, publication etc.

Richard
Posted by <richard.dudley@attglobal.net>
posting date Sat, 10 Nov 2007 19:44:00 +0700
_______________________________________________

""Kim Warren"" <Kim@strategyd · Mon Nov 12, 2007 6:58 am

Posted by ""Kim Warren"" <Kim@strategydynamics.com>

A small tip, but one I have found immensely useful I believe should be
credited to Geoff Coyle .. A model should do what the real world does,
and for the same reasons.

The first clause hits the 'goodness of fit' criterion, but the second
has two further implications. First, the model should only include
elements that can be identified and measured [or conceivably could be]
in the real world. Secondly, those real-world variables should also
behave as observed.

As I understand it, the major risk if this additional consideration is
not included is that a model may fit well with historical data, but then
immediately go off-track as soon as the future starts to unfold.
Additional problems include the significant issue of getting management
to buy in to a model in which they don't recognise the elements of which
it is made up.

There is a puzzle in all this, though. Many of the most insightful SD
models are simplified to high levels of abstraction. Yet skilled SD-ers
somehow manage to generate confidence in the structure and findings.
This seems an important skill, as it is often not practical to capture
enough real-world detail to fully conform to the rule above. I may not
be alone in welcoming advice on how to accomplish this.

Kim Warren
Posted by ""Kim Warren"" <Kim@strategydynamics.com>
posting date Mon, 12 Nov 2007 10:17:55 -0000
_______________________________________________

Fabian Fabian <f_fabian@yahoo · Mon Nov 12, 2007 6:58 am

Posted by Fabian Fabian <f_fabian@yahoo.com>

About Models QA:

In the abscence of an ""ISO SDM"", there are two types of filters that
any model of any type could be subject to: Model Verification and Model
Validation.

The latter has as a resource a specific battery of tests tailored for
System Dynamics (as described in classic papers by Forrester & Senge;
and Barlas). Apart from those 'objective' tests, model validation is a
process of confidence buildup during the model building process. That
confidence is highly correlated with the perception of model ownership
by the client (internal client, in answer to Richard's question). Thus
the importance of considering designing a group model building process,
including the model's client.

I cannot foresee the development of a universal ISO for the confidence
buildup process, as each organization has its own DNA (genetic code), so
each client would gain confidence in a different way. This diversity is
also the reason for benchmarking not being useful as a SD Model QA
component.

The former (Verification) could be implemented by internal or external SD
consultants who could verify if the model has methodological and/or
formulation flaws. Balci has developed some papers about model verification.

In closing, in my opinion there could be attempts of standardizing the
verification process and even the implementation sequence of validation
tests, aiming to defining a SD Model QA process.

However, we certaintly can't imagine one ISO component stating something like:

Customer! Thou shall gain confidence on the model following THIS process,
and in less than 3 months.

Be all well...

Fabian Szulanski
System Dynamics Centre
Instituto Tecnologico de Buenos Aires
Posted by Fabian Fabian <f_fabian@yahoo.com>
posting date Sun, 11 Nov 2007 05:57:19 -0800 (PST)
_______________________________________________

""Jack Homer"" <jhomer@comcas · Tue Nov 13, 2007 8:39 am

Posted by ""Jack Homer"" <jhomer@comcast.net>

Kim Warren cites Geoff Coyle: "".. A model should do what the real world does,
and for the same reasons."" That's still not quite enough. The SD approach to
model evaluation (as in Forrester and Senge 1980) also considers whether the
equations are robust to extreme conditions, the constants and table functions
have plausible values, the boundary is adequate for testing all relevant
questions, the outputs are plausible under all test conditions, and whether the
model is transparent, has explanatory power, and provides results that provide
novel insight. In other words, we require that our models not only look
realistic, but also that they be robust and useful.

Kim asks how one can establish confidence in a model that is highly aggregated
and may therefore seem abstract and distant from real-world data. This raises a
good question about one of the sometimes-overlooked requirements of good modeling:
the ability to find (and cite) lots of real-world data, and then to summarize or
capsulize it in the form of a single variable. Often, an existing metric exists
that will serve as a good representative proxy for all of the real-world details,
that will persuasively ""sum up"" all of the details. Every social science needs
and rests upon such summary measures, and so do we. If a commonly accepted summary
measure does not already exist, the modeler will have to find or create one that
seems to fit the requirements, and make a good case for it.

For example, in a high-level model of health and health care (see current issue
of SDR), we wanted an overall measure of ill health prevalence, not the
prevalence of any disease in particular, but all significant symptomatic disease,
and especially chronic disease, combined. Time series exist on the prevalence of
dozens of individual diseases, but one cannot just add them up because of (a)
different levels of severity and (b) significant overlaps among them (if you simply
add them all up, it appears that over 100% of the population has chronic disease).
Instead, we looked to data on self-reported health status: excellent, very good,
good, fair, poor. These self-report measures have been tracked annually since 1982
and have been found to be good predictors of individual health care utilization.
""Fair+poor"" is typically 10% of those surveyed, while ""good+fair+poor"" is over 30%.
We ended up using ""good+fair+poor"", because a sum of the prevalences of just a handful
of mostly non-overlapping symptomatic diseases (heart disease, cancer, diabetes, asthma,
bronchitis, arthritis) gives at least 30%, and much exceeds 10%. Thus, we have a
metric that (a) reflects the notion of ""at least moderately symptomatic disease"" and
(b) rests upon accepted data with a long history. It is not a perfect metric, but it
is the best one available, and health professionals who have read our work have for
the most part accepted our use of the metric. That's what I think you look for in a
good summary measure.

Jack Homer
Posted by ""Jack Homer"" <jhomer@comcast.net>
posting date Mon, 12 Nov 2007 09:37:41 -0500
_______________________________________________

Jean-Jacques Laublé <jean-jac · Tue Nov 13, 2007 8:39 am

Posted by Jean-Jacques Laublé <jean-jacques.lauble@wanadoo.fr>

Hi Richard

About your question:

What procedures do people use to ensure that the model formulation, testing,
evaluations cycle is done in a rigorous manner? How could that be improved?

Like Fabian, I think that there is no real answer to your question.

For the simple reason that there seems to be different methods of building models.
The process of validation will then be very different, depending on the method.

For instance, if you look at the web site of Ventana systems, you will see a
very simplified process of building and you will notice that it is relying heavily
on reality checks, right from the start.

They verify all along the process that the model conforms to a predefined set
of standard behaviour of reality plus a set of past data.

On the other side there does not seem to be a lot of qualitative analysis with
diagramming.

If you study the Vensim user guide, there is too little mention of diagramming, and
the modelling process starts immediately with a simple quantitative model with a single
stock.

>From all the models used, there is only one that refers to past reference modes, and the
other ones do not.

Logically all the models from the modelling guide should be constructed with a set of
reality check equations. No one has it.

That means that what is prescribed is not done.

There is only one model with past references and no models with reality checks.

This could be interpreted as if in the reality, reality checks or past references are rarely
useful.

But when one sees the Ventana systems web site, one sees that data and reality check are
heavily used.

I can add that Vensim being one of the most used software, I have never seen any model
with reality checks in published articles or papers.

All this seems rather incoherent.

I think that one of the reasons of not using reality checks, is that it needs a lot of work
and experience, especially in choosing the right equations and a very good knowledge of
reality. You can choose hundreds of reality check equations, even for a relatively simple model.

But going further on, some methods recommend building a complete conceptual qualitative
diagram, eventually with the stock and flows, prior to building a quantitative model (Hines,
Coyle).

Sterman advocates too building a diagram, but thinks that it is better to quantify and get
data to compare them with reality, to verify the validity of the diagram.

But the big difference comes from the Hines and Sterman way of building a diagram and the
Coyle's way.

I have never personally understood the Hines way (I can see him nodding his head across
the ocean!).

I have tried to use it and it never worked for my cases.

To resume it too simply, I do not believe that the study of reference modes can give a precise
indication on the structure of a model and can give a hint on an eventual hypothesis about the
structure.

It works with simple models or if the problem has a dominant loop and no exogenous data,
otherwise the past behaviour can be the result of multiple loops and external influences.

When one studies Coyle's way to build a diagram, he just does not use reference modes at
all. Is it for the same reason than the one I described?

More on that he does not use reference modes to verify his models too!

At least not in the models he studies in his book 'System dynamics modelling a practical approach'.

I have not yet finished studying the book that is extraordinary dense.

With the time I like less and less using reference modes, because the only thing that it
can tell you, is that the model is wrong, (which is not so bad) but it cannot tell you were it is wrong.

It can eventually show you that the model behaves relatively like the reality, but at the condition
that you know how much variable you have to survey, how much independent they must from one
another and how much close the models data must be from the real ones.

The thing I do not like in reference modes, is that they act like a block box, they give you
results, but do not explain why?

To resume Coyle's way to analyze things relies heavily on the structure and on its deep
understanding and not very much on the past behaviour.

One of the particularities of Coyle's method, is to show how to analyze qualitative
diagrams in a very original way. He recommends building diagrams at different levels
of abstraction, which is the best way I know to really understand deeply a model.

Of course I like Geoff's method because it is based on deep understanding of the
problematic situation and not on past data that can be biased, difficult to get,
representative of a past situation and not of the future.

It is the only complete and coherent published (there might be better unpublished
one's) method of building a model, which starts from the problem definition to the
finished model, training the reader all the way with about 8 models, through all
the steps of the modelling process.

The only think lacking is how to build a sound written definition of the problem.
By interviews, using cognitive map techniques?

The book is unfortunately not very appealing, written with little characters, based
on an old software, no more published: cosmic and cosmos.

I personally do not mind because I always do myself the modelling before reading
the solution of the author directly in Vensim.

After this too long ( I had not the time to make it shorter) explanation,
you will agree that there is still a long way to an unified method of building models.

Regards.

Jean-Jacques Laublé. Eurli Allocar

Strasbourg, France
Posted by Jean-Jacques Laublé <jean-jacques.lauble@wanadoo.fr>
posting date Tue, 13 Nov 2007 12:39:03 +0100
_______________________________________________

Tom Fiddaman <tom@ventanasyst · Tue Nov 13, 2007 8:39 am

Posted by Tom Fiddaman <tom@ventanasystems.com>

> Ultimately, the ONLY fundamental criterion of SD model validity is ""what
> did it change, in what direction, and by how much?""

I'd call this influence, not validity. A completely spurious model could be
involved in a process that changed things a lot. Influence and validity are
separate, individually necessary but not sufficient, conditions for a successful
project. Validity is about the ""in what direction"" part - i.e. do the contingent
predictions yielded by the model direct change in the right direction.

> ""Goodness of fit"" would seem to be the single most important measure. By
> that, I mean two things. First, does the model as conceived seem a good
> fit with the variables and relationships making up the system being modeled?
> Second, does the behavior of the model produce a good fit with the data
> generated by the real system?

This is a useful elaboration, because frequently people only think about the second
criteria - fit to data - which is by itself a weak test. The first test, determining
whether the model is a good match for the system under consideration, is much more
important and difficult. Client involvement with the model is one key component of
such a test, e.g., for establishing the face validity of the model structure and key
relationships, and for proposing and evaluating experiments that must pass a laugh
test. Other components include all the things we normally check - units balance,
nonnegativity of physical stocks, robustness in extreme conditions, agents' use of
available information, etc.

Typically the application of such checks is informal, in part because modelers work
in small teams. The software development world has far more elaborate formal
processes for building in model quality or at least enforcing conformance of product
to spec. I think many are not appropriate for use because modeling is not the same as
programming. But most likely we could benefit from routinization of some aspects of the
validation process.

The only example I know of is Reality Check in Vensim, which lets you build up a library
of behavior tests that can be automatically applied to a model. Unfortunately it hardly
ever gets used, even by us. I'm not sure whether that's because the implementation is
deficient or because modeler incentives favor quantity of equations over quality. But I
would love to see reality checks made a standard part of every conversation about models.

Tom

****************************************************
Tom Fiddaman
Ventana Systems, Inc.
Posted by Tom Fiddaman <tom@ventanasystems.com>
posting date Mon, 12 Nov 2007 10:25:38 -0700
_______________________________________________

Richard Stevenson <rstevenson · Tue Nov 13, 2007 8:39 am

Posted by Richard Stevenson <rstevenson@valculus.com>

Tom Fiddaman wrote:
>I'd call this influence, not validity. A completely spurious model could be
>involved in a process that changed things a lot. Influence and validity are
>separate, individually necessary but not sufficient, conditions for a successful
>project. Validity is about the ""in what direction"" part - i.e. do the contingent
>predictions yielded by the model direct change in the right direction.

Kim Warren wrote:
> There is a puzzle in all this, though. Many of the most insightful SD
> models are simplified to high levels of abstraction. Yet skilled SD-ers
> somehow manage to generate confidence in the structure and findings.
> This seems an important skill, as it is often not practical to capture
> enough real-world detail to fully conform to the rule above. I may not
> be alone in welcoming advice on how to accomplish this.

I accept Tom's point about influence and validity being separate and necessary
conditions. And Kim's point is also neat; skilled SD-ers must have much more about
them than SD itself.

So I guess my own point is that competent consultant practitioners must both:
(a) be capable of technically validating models themselves, using appropriate methods.
If the client can't trust you to do this, what's he paying for?
(b) be capable of working closely with the client team (workshops, facilitation,
personal relationships, etc) to create influence. This is the hardest bit, by
far. It's politics, usually.

And the point about 'a priori' and 'a posterior' validity and influence is also
important. The SD community has been remarkably shy of reviewing the influence of
its client work 'down the track'. It's common that 'the difference' takes years
(even decades) to prove. That's why it's still so difficult to justify this work 'a
priori'. There is often a fundamental mismatch (most often in the very long life-cycle
industries where we can do most good) between short-term management interests and
long-term investor interests. See the pharmaceutical case example on our web site
(www.valculus.com) for a classic example of mixed management and investor motives.

This reasoning is a factor in my initiative to move away from 'selling' SD as a
strategy planning method in its own right. Rather, we intend to focus on industries
where short/long term discontinuity is an acknowledged issue - and where long-term
value is the critical investor interest. We then 'wrap up' SD in a wider methodology
that includes rigourous financial valuation tools and an approach to understanding
the long-term role of intangible assets.

'SD inside', indeed - like selling a house without selling the boiler. But the point
is - we address the issues that managers and investors increasingly have to communicate
about.

The sales justification for this approach (we now call it 'resource-based valuation'
because that's exactly what it is) is the 'a posterior' analysis of various client
projects that my company Cognitus conducted for blue chip clients over the past couple
of decades. In many cases we can now broadly see the 'value added' - and in a few cases
it has been very substantial indeed. We will be adding more case examples to the
website over the next weeks and months, to make the point.

This really makes the critical distinction between validity and influence. Validity is
the easy bit!

Richard Stevenson
Valculus Ltd
Posted by Richard Stevenson <rstevenson@valculus.com>
posting date Tue, 13 Nov 2007 10:42:31 +0000
_______________________________________________

""Keith Linard"" <klin4960@bi · Thu Nov 15, 2007 7:15 am

Posted by ""Keith Linard"" <klin4960@bigpond.net.au>

The fundamental step in ensuring the quality of models is ensuring that we
are modelling the correct problem. All too often I have seen consultants
(whether internal or external to the organisation and whether from the SDM
stable or other OR) coming up with technically brilliant quality solutions
to the ""wrong"" problem.

Echoing previous respondents and the extensive literature on this topic I
would suggest the following are also critical to quality modelling:

1. Empathy with the client: Modelling supports decision making. If I do
not understand the client's needs and the socio-political environment within
which s/he operates, the most technically brilliant quality model is likely
to fail.

2. Engagement with the client: SDM is fundamentally directed to changing
structures which affect behaviour. Changing structures almost inevitably
means challenging ""the way we do things around here"" ... culture, power,
influence, procedures, networks etc. The journey to the solution, the
process of modelling, is as important as, if not more important than the
technical quality of the model.

3. Building client confidence: The model structure should be modularised
so that, however complex the underlying mathematics, the top level stocks
and flows structure can be 'read' by the subject area expert. Three key
advantages of SDM for me have been first the fact that I can use a (top
level simplified) stock-flow model to feed back to the client my
understanding of the current process; secondly, that I can use a
'simplified' stock- flow picture with causal loops superimposed to
communicate to the client how the dynamic behaviour of the system is
producing the typically unforeseen consequences; and thirdly, I can use a
'simplified' stock-flow model to show build the confidence of senior
executive that I have indeed included all key relationships.

In regard to the last point, I have found that line operators, technical
experts and management staff have no difficulty in 'reading' their areas and
challenging my understandings of 'how the system works'.

4. Key modelling techniques for ensuring quality: (in addition to all the
ideas in the literature on model validation & verification)
4.1 Rigorously apply units to all stocks, flows & auxiliaries. With
the introduction of Powersim Studio for student projects, with its utterly
frustrating (initially) pedantic approach to units, the quality of student
projects jumped to a much higher level. This particularly imposes a
salutary reality check on willy-nilly use of arbitrary 'unitless' modifying
auxiliaries.
4.2 Always run a 'mass balance' sub-model in parallel with the model
building which is continually verifying that 'what goes in equals what comes
out'.
4.3 Start the model at, eg, time minus 10, and have 10 time periods of
historical data, so that you can check actual against projections. This, I
have found, gives the client a profound sense of comfort with the next 20+
time periods of projection (whether it is warranted or not). Refer, eg,
Oliva, R. 1995. A VensimR Module to Calculate Summary Statistics for
Historical Fit. System Dynamics Group, MIT. Memo D-4584. Cambridge, MA.
(For Powersim users I can send an example where this approach is used.

The following are just some of the texts & papers which have an excellent
focus on both model conceptualisation & the modelling process.

Andersen, D.F. and Richardson, G.P. 1997, 'Scripts for group model
building.' In System Dynamics Review, Vol. 13, No. 2, Summer 1997.
Checkland, P.B. and Scholes, J. 1999. 'Soft Systems Methodology In Action.'
John Wiley and Sons, Chichester, UK.
Coyle, R.G. 1996. 'System Dynamics Modelling: A Practical Approach.' Chapman
and Hall, London.
Richardson & Pugh's 1981 book ""Introduction to System Dynamics""
Vennix, J.A.M, Andersen, D.F. and Richardson, G.P. 1997, 'Group model
building, art and science.' In: System Dynamics Review, Vol. 12, No. 2, (
Summer) 1997: 103-106.
Vennix, J.A.M. 1996. 'Group Model Building: Facilitating Team Learning Using
System Dynamics', John Wiley and Sons, Chichester, UK.
Wolstenholme, E. 1990. 'System Enquiry - A System Dynamics Approach.'
Wiley, Chichester.

Keith Linard
Ankie Consulting Pty Ltd
134 Gisborne Road
Bacchus Marsh
VIC 3340
Posted by ""Keith Linard"" <klin4960@bigpond.net.au>
posting date Thu, 15 Nov 2007 09:54:59 +1100
_______________________________________________

<richard.dudley@attglobal.net · Fri Nov 16, 2007 8:02 am

Posted by <richard.dudley@attglobal.net>

I thank everyone for the responses regarding my ""ensuring the quality of models""
question.

However, I still didn't get an answer to what I was asking. Perhaps I was not
clear enough, or perhaps there is no clear answer.

I am somewhat familiar with the various possible procedures for model
verification/validation and for checking model quality, but...

What I was really asking was: What are the internal procedures used to ensure
that verification and checking procedures are actually used. Is there a
committee that gives a final OK to a model? Is there a team procedure that is
followed to OK the ""final version"" of a model? Is the model reviewed by an
""model auditing group""? Is it left to the modeling team to determine when a
model is good enough? Is it only via negotiations with a client that a model is
determined to be OK?

I am wondering if firms or organizations have procedures that are used to
(attempt to) avoid the release of faulty or unfinished models....

Or maybe my question doesn't make sense?

Richard
Posted by <richard.dudley@attglobal.net>
posting date Fri, 16 Nov 2007 03:21:26 +0700
_______________________________________________

""Alan Graham"" <Alan.Graham@ · Fri Nov 16, 2007 8:02 am

Posted by ""Alan Graham"" <Alan.Graham@paconsulting.com>

Keith Linard is so very right:

""The fundamental step in ensuring the quality of models is ensuring that we
are modelling the correct problem.""

These days I'm telling people that there are three validations of a
model's fitness for purpose, all of them necessary, and they require
quite different validation processes (i.e. tests to disprove):

1. Validating the purpose

2. Validating the model

3. Validating the results

In practice, validating a purpose is done more effectively if the
modelers go quite a bit beyond the relationship hygiene that Keith
Linard discussed--a specific enumeration of the purpose, validated by
thorough discussion with the stakeholders (e.g. clients) is immensely
valuable later in the modeling process.

A one-sentence purpose of the form ""to model X"" or ""to understand X"" is
usually a good sign that the modeling effort will end up being
ineffectual--it would take some extremely focused and effective thinking
and persuasion elsewhere to make up for this vagueness. Such abstract
purposes imply no criteria for demonstrating that a model has filled its
purpose; there's nothing specific enough to be able to tell whether a
modeling effort will be useful for anyone. And that's a big problem
waiting to happen. (And not just for commercial modelers--when can one
know when a thesis is satisfactorily done?)

A tangible and agreed-on model purpose will typically have at least a
couple of these elements:

1. Actions or policies ""on the table""--what real actions or policies
(in the SD sense) is the model supposed to be influencing? (Some
high-level issues will be ""above the pay grade"" of the client, while
others will be too detailed to justify SD modeling.) This ultimately
translates fairly directly into model policy levers. Even if the client
is really at the stage of bewilderment and can't articulate actions,
it's very helpful to state something like ""to evaluate actions feasible
for the X organization, such as A, B, C"" to start focusing the client's
thinking.

2. What defines a better outcome? Often, a client's organization will
have specified metrics (e.g. IRR or 5-year cumulative profit). It's a
good idea to be computing an NPV (or a social present value metric of
some kind) in addition, if for nothing else, to sharpen the analysis of
tradeoffs.

3. What contingencies might change the analysis' results? E.g., is a
given policy still an improvement if the economy enters a recession? Or
if people are less responsive than the market research indicates? These
should translate directly into exogenous inputs or parameter changes for
policy sensitivity tests.

4. What are client hopes, fears, puzzles and disagreements? Often,
clients will have very specific concerns, such that the modeling effort
can't succeed in creating an improvement in the real system until the
clients get past their concerns. Sometimes these are simply
contingencies or uncertainties about actions, which fit into categories
1 or 3. But other times they've been, e.g. sharp disagreement over
whether a standard industry metric was really appropriate (given that a
favored alternative policy decreased the standard measure while
increasing NPV). Or disagreement or uncertainty about the strength of
two opposing impacts of a given policy. This is perhaps close to a
""relationship hygiene factor"", but client concerns are often quite
specific, and are much more useful to the analysis if addressing those
concerns is explicitly part of the model purpose. The (SD) classic
dynamic hypothesis may also codify a client concern (feared behavior and
a cause for it), and the description usually contains metrics of
goodness as well.

The model purpose is something to start codifying and validating with
the client/stakeholder before even causal diagramming. And ""codifying""
means a written, presented, discussed and distributed document.
(Usually Powerpoint, but client cultures differ.)

The model purpose also needs revisiting at important milestones
thereafter: Often, modeling insights will shift the perception of the
problem, and unless the clients likewise share the changed perception,
there will be problems down the road.

And of course validation of model and results also needs to be done (to
the satisfaction of both the modeler and the stakeholders) and also
making sure that whatever organizational convincing and change
management is needed will in fact get done. But those are other
stories.

Cheers,

Alan
Posted by ""Alan Graham"" <Alan.Graham@paconsulting.com>
posting date Thu, 15 Nov 2007 09:48:01 -0500
_______________________________________________

""Keith Linard"" <klin4960@bi · Sat Nov 17, 2007 8:24 am

Posted by ""Keith Linard"" <klin4960@bigpond.net.au>

Offline I received the following comment on my previous posting:
<<I agree completely about the correct problem.
<<But what is a correct problem?

In response I have attached an extract from an Australian Federal Department
of Finance manual that I wrote, as Chief Finance Office, in 1986:
""Evaluating Government Programs - A Handbook"". Whilst originally focussed on
program evaluation generally, I suggest the checklist is equally relevant to
the process of undertaking any SDM project (and is consistent, e.g., with
processes such as Vennix's 'Group Model Building' or Eden's 'Cognitive
Mapping'. It also addresses issues such as whether there should be a
steering committee (comprising, e.g., various stakeholders including QA).

All major management consulting firms have developed internal methodology,
which would cover most if not all of the points I have noted, to ensure
quality control. I believe every SDM consultant should apply such a
framework, adapted to their particular consulting niche.

APPENDIX A: STEPS IN THE PRE-EVALUATION ASSESSMENT * A CHECKLIST *

1. Define purpose of the evaluation:
* background to the evaluation
* audience

2. Define the nature, scope and objectives of the program:
* nature of problem addressed by program
* program authority or mandate
* program objectives
* actual or planned resource use

3. Analyse program logic:
* logical links between inputs, outputs, outcomes and objectives

4. Specify alternative ways of meeting program objectives

5. Identify key evaluation issues:
* program rationale
* impacts and effects
* objectives achievement
* alternatives

6. Identify evaluation constraints:
* time, cost, expertise and credibility

7. Assess appropriate evaluation designs:
* management issues
* alternative evaluation methods
* data collection issues
* data analysis issues

8. Develop strategy for evaluation study:
* terms of reference
* preliminary work plan
* consider input from other agencies
* consider composition of steering committee and evaluation team
* prepare pre-evaluation assessment report

(Steps 9 & 10 taken from Appendix B -'The Evaluation Study')

9: Do the evaluation study

* get agreement on terms of reference
* assign steering committee and study team
* approve work program
* undertake evaluation
* communicate results.

10 Suggested Framework of Evaluation Report

* Executive Summary
* Introduction
* Substance of the Report
* Recommendations
* Resource Issues
* Appendices
_______________________________________________

A-1: PURPOSE OF THE EVALUATION * CHECKLIST FOR STEP 1 *

1. What are the objectives of the evaluation?
2. Who, or what event, initiated the evaluation?
(e.g., routine evaluation cycle, Auditor-General's report, ministerial
request).
3. What is the stated reason for conducting the evaluation
(e.g., assist development of program proposal, review of agency
priorities).
4. What is the 'hidden agenda', if any?
(e.g., defuse public controversy, answer criticism, provide rationale
for abolishing program).
5. Who are the primary audience for the evaluation report and what
authority do they have over program resourcing or management?
6. Which other key decision makers have a strong interest in the
evaluation and what influence do they have on program decisions?
7. Have the decision makers' needs and expectations been determined?
8. To what phase of the program development and implementation cycle
will the evaluation relate?
(e.g., new policy proposal, review of existing program, review of
completed program).
9. What issues are of particular interest?
(e.g., matters raised in Parliament or by the Auditor-General;
achievement of key program objectives; cost effectiveness).
10. How important is each issue?
(e.g., in terms of political impact, cost, or scope for improved
performance).
_______________________________________________

A-2: NATURE, SCOPE AND OBJECTIVES OF THE PROGRAM * CHECKLIST FOR STEP 2
*

1. What is the mandate or authority for the program and are program
activities consistent with this?
2. What are the stated objectives of the program?
3. What were the catalysts which led to the development of the program?

(e.g., who were the key proponents, what studies/inquiries recommended
this approach?)
4. What key needs, gaps in services, problems are/were the program
intended to solve?
5. What results are/were expected from the program?
6. What reasons are/were there for believing that the program would be
effective in achieving these results?
7. Is there a clear and unambiguous definition of the target group at
which the program is aimed?
8. Have program implementation or other changes in the social/political
environment
affected the relevance of the original program objectives or
introduced new objectives?
(e.g., changes in demographic profile, strong popular support for
program, creation of perceived ""rights"" to a benefit.)
9. What would be the consequences if the new program were introduced
(or an existing one abolished)?
Who would be affected? Who would complain and who would be glad? Why?
10. What measures or criteria were identified at the program development
and
implementation phase as appropriate output and outcome indicators?
11. Are these performance indicators still relevant?
12. In the light of program operation experience, are there other
performance indicators
which are more relevant or which assist further in understanding the
success or otherwise of the program?
13. In respect of each performance indicator, were targets (standards,
levels of service) set;
when; by whom; with what justification; and were they achieved?
___________________________________________

A-3: ANALYSE THE PROGRAM LOGIC * CHECKLIST FOR STEP 3 *

1. Specify the ultimate objectives of the program.
2. Subdivide the operation of the program into a manageable number of
major activities or phases
(between 5 and 10 segments is usually appropriate).
3. Specify intermediate objectives relevant to each phase/activity
(there should be at least one intermediate objective for each of the
program's ultimate objectives).
4. Identify the inputs and intended outputs or outcomes of each of
these major phases.
5. Identify significant secondary outputs/outcomes (whether desirable
or undesirable).
6. Specify the perceived logical relationships
(i.e., how a particular phase is supposed to achieve the intermediate
objectives),
the implicit assumptions underlying the relationship between the
inputs, the outputs,
the outcomes and the intermediate or final objectives.
7. Confirm with the program managers and line managers that the model
is a realistic representation
of what happens or, for a new program, is supposed to happen.
8. Identify the evaluation questions which are of interest in respect
to each phase
(these should directly address each assumption in point 6).
9. Specify performance indicators which can be used to monitor or
answer the evaluation questions in point 8.
10. Assess, in conjunction with program managers, what are the critical
assumptions and the corresponding key performance indicators.
___________________________________________________

A-4: IDENTIFY ALTERNATIVES * CHECKLIST FOR STEP 4 *

1. Review history of program and the reasons for the current
evaluation.
2. Review reports, journals etc on approaches to the problem in
question.
3. Use structured problem solving techniques (DELPHI, brainstorming
etc) with groups of professionals, clients etc.
4. Undertake screening of options.
5. Report briefly on discarded options.
6. Include selected options for analysis.
__________________________________________________

A-5: IDENTIFY KEY EVALUATION ISSUES * CHECKLIST FOR STEP 5 *

1. Program Rationale
* Does the Program make sense?
* Are the objectives still relevant?
2. Impacts and Effects
* What has happened as a result of the Program?
3. Objectives Achievement
* Has the Program achieved what was expected?
4. Alternatives
* Are there better ways of achieving the results?
________________________________________

A-6: IDENTIFY EVALUATION CONSTRAINTS * CHECKLIST FOR STEP 6 *

1. Time
2. Cost
3. Expertise
4. Credibility
5. Political and Social Environment
_______________________________________________

A-7: ASSESS APPROPRIATE EVALUATION DESIGNS * CHECKLIST FOR STEP 7 *

1. Specify those activities or changes (due to the program) which must
be measured.
2. Identify sources of data.
3. Decide appropriate means of measuring changes due to program as
distinct from changes due to non-program factors.
4. Decide procedures for obtaining data (e.g., sample survey, automatic
monitoring, simulation, modelling).
5. Decide appropriate analytical approaches for analysing the data.
6. Decide how the results are to be aggregated and presented.
_______________________________________________

A-8: DEVELOP STRATEGY FOR EVALUATION STUDY * CHECKLIST FOR STEP 8 *

1. Prepare terms of reference or brief which includes clear and
unambiguous statement
of the purpose and nature of the evaluation: (key issues to be
addressed,
the specific questions to be answered, the audience, the timing and
the decision context).
2. Prepare preliminary work plan indicating
* how the purpose of the study is to be achieved;
* when each and all tasks are to be completed; and
* what evaluation products are required.
3. Provide a clear statement of procedures for review of progress and
of any key decision points.
4. Provide a clear statement of time and resource constraints, and of
procedures for amending these.
5. Consider input from other agencies, composition of steering
committee and evaluation team;
identify official points of contact among program officials and where
appropriate, clients; and
6. Prepare an outline of procedures for amending the evaluation work
plan should this subsequently be required.
___________________________________________________

A.9: DO THE EVALUATION STUDY * CHECKLIST *

1. Get executive agreement on the Terms of Reference, proposed Work
Plan, Review & Decision Points,
Time & Resource expectations, Data & Person Input and Person
Involvement expectations and Work Program Change Procedures.
2. Assign steering committee and study team; decide on the extent of
involvement of other agencies including central agencies, etc.
3. Prepare detailed work plan.
4. Prepare time, resources and methodology schedule for data
collection, analysis and reporting.
5. Undertake evaluation (e.g., collect data, test reliability, document
and analyse data).
6. Communicate results.
__________________________________________________

A.10 SUGGESTED OUTLINE FOR EVALUATION REPORTS

1. Executive Summary
* a brief statement of evaluation objectives and methods
* a summary of major findings and conclusions
* recommendations and matters needing further consideration
2. Introduction
* terms of reference for the study
* identification of constraints on the study
* statements of key assumptions and values underlying the report
3. The Substance of the Report
(a) Program (Element) Description
* a statement of the mandate and key objectives of the program
* an exposition of the logic of the program
* definition of key concepts
(b) Summary of Analyses Conducted
* justification for key evaluation issues addressed
* description of data collection procedures and measurement
devices;
* outline of collection results, with indication of
reliability
4. Findings and Conclusions
* results of analysis related to program (element) objectives
* overall findings and discrepancies between these and program
objectives
* conclusions organised in terms of major evaluation study issues
5. Recommendations
* recommendations set out to show derivation from findings and
conclusions
* alternative options considered and reasons for rejection
* matters recommended for further study and estimates of the required
resources.
6. Resource Issues
* resources required to implement the recommendations
* offsetting savings
7. Appendices
* detailed documentation of data collection and analysis procedures
* list of references
* list of staff/organisations consulted during the study
* list of steering committee and study team members.
_____________________________________

Keith Linard
Posted by ""Keith Linard"" <klin4960@bigpond.net.au>
posting date Sat, 17 Nov 2007 07:58:02 +1100
_______________________________________________