People have brought up many threads in this discussion. I will just focus
on, what I think are some key aspects of the issues that Jim raises. This
will not be a politically correct, balanced perspective. (I have full
sympathy for L.L Laubles entry.)
To start, there are no first principles in social science. We have already
noted there are none in science either. (John Sterman brought up the
example of Newton versus Einstein versus string theory versus.) From Gödel,
we know it does not even exist for mathematics. There is no "truth." ALL
models are wrong . but all we have, all anybody has, are models (mental or
formal) of everything. We want a models to be useful. What does "useful"
mean? It means (I suggest) that there has to be confidence that the
recommendations implied in the model results would produce an outcome more
satisfactory than the current (dynamic) REAL WORLD condition. Otherwise,
why waste the time? (A hidden assumption to this assertion is that this
implies rationality. Carl Sagan has argued being rational is our only
choice in his "The Demon Haunted World." I personally take the long view.
Darwin will ultimately sort out the rational from the
irrational --
http://www.darwinawards.com/ . )
The confidence in the hypotheses or our models comes not from proving what
is "right." The only choice is to winnow out the wrong (as best we can) and
make progress by examining the hypotheses that seem to have the best chance
of not being so wrong that they are not useful. (Get used to the double
negatives. They dont make a positive. Karl Popper is still the key
starting reference for this logic.) Thus, we have nothing useful unless we
have some basis (corroboration) that it *might* be true - or more easily -
that it passes tests that try to prove it wrong. If you cant come up with
a test to prove it wrong, it is not even a legitimate hypothesis and can
only lead, by definition, to a useless end. This is obvious from a
rationality perspective. John Sterman has, again, discussed this at length.
We seem to be polarizing ourselves in this discussion for no good purpose.
We talk of the sacred stories and anecdotes. We rail against the
over-powerful statistics in all its abstract glory. We want to fall back to
the supposed security of experimental science. We want the freedom to
assume true what we "know" rings true. Do we just want to be a designer
cult? Give me as story. I make a model. It is the new religion. Whatever
you want. Come one, come all. Hope, want, belief, does not make things so.
If everything was black and white we would not need to discuss these topics.
We are prolific in opinion and wanting in substance. We believe we are the
first to have these thoughts and care none to learn what others before us
have done to explore these topics.
More to the point, we need to connect words with purpose and meaning. By
statistics we mean using data to increase the confidence in our hypotheses.
Using statistics does not necessarily mean using regression and R-squares.
It can mean experimental designs of our model runs with non-parametric
statistical comparisons of what the model produced relative what the data
indicate the world, which we are attempting to model, produces. This thread
is not a discussion of two equals -- stories versus statistics, where we
"choose" one or both. All of us are just repositories of misperceptions (me
included). John Sterman and Linda Booth-Sweeny have reams of DATA to show
this - albeit to a less extreme point than the one I am venting. The human
mind is very limited. Its senses and perceptions are colored by experiences
that we necessarily incompletely misunderstand. Data are a record of the
actual events. Like all else, it is far from perfect, but the use data is
the only connection we have between what we think "is" and what the
objective world "does." Jim Hines wants to make dichotomies that I dont
think exist. There is no perfect approach, but there are approaches that
are more formal and rigorous (that is rational) than others. An approach is
irrational until proven rational. It is NOT our choice. Running
hypothetical models can tell us how equations work and provide valid
experimental platforms to understand dynamics. They dont tell us whether
those results have any relevance to the real world. Even if we make
experiments that could be like those of basic science (which we cant), we
can never know whether we have proven anything. We can only increase
confidence or falsify our current thinking.
Anecdotes and stories are only useful to develop hypotheses to be tested. I
think the data shows that these stories and anecdotes have logic flaws that
render 99.999% of the hypotheses wrong. (I am sincerely being an optimist
on how many 9s I used.) R. Dawes is good place to start on showing
(proving?) how well we deceive ourselves (Dawes, R, Rational Choice in an
Uncertain World, Harcourt Brace Jovanovich, San Francisco, 1988) Michael
Sherner makes a living showing our inability to convert experience to
rational conclusion (e.g. Michael Sherner, Why People Believe Weird Things.
1997, Freeman and Company, NY). There are also numerous books on Social
delusions, Mass hysteria, Mass panics, and, what relates to our work, group
delusion - as in group modeling. I will give an example shortly to make
this point meaningful.
As Erling notes, the Bayesian approach can connect "beliefs" to hard data.
Stories and anecdotes are primarily (exclusively?) made of beliefs. There
is much written about the Bayesian approach. A starting book might be
"Scientific Reasoning: the Bayesian Approach" by Colin Howson and Peter
Urbach (Open Court, 1993). In general the "beliefs" are from experts . and
not from producers of anecdotes. The Bayesian twist is often to help avoid
testing bad hypotheses by having the expert explain what does not makes
sense. The weighting, however is still not zero-one in that you may only
partially believe the experts (they have the picture wrong too) or you
(typically) have multiple experts with mutually exclusive positions (just as
you often have multiple data sources that are mutually inconsistent).
Camilo brings up an important new thread - although maybe in a different
vein than intended. Clark Glymour comes to the table trying to discover
causation (Clark Glymour, Computation, Causation, and Discovery, 1999, MIT
Press). He is very much a statistician. What really he argues, as should
we, is that statistics are mostly valuable to us for aiding our
understanding of causality. That causality should be intrinsic to our
hypotheses formation - those wonderful levels and rates imbedded within
feedback loops. The statistical use of data should be looking to refute or
support a causal explanation. This perspective is consistent with the
quotes that Camilo notes and, I think, more consistent with Glymours
intent. Conveniently, this weeks The Economist ("Signifying Nothing," Page
76, January 31, 2004 edition) notes how economists misuse statistics. The
prime complaint is that statistics are allowed to provide conclusions
without theory (i.e., without a causal hypothesis). I propose that we would
(or should) agree that void of a causal meaning, statistical implications
mean nothing. We really do need to focus on the "why." (And maybe do need
to search for better ways to make SD students and practitioners more readily
embrace that logic.)
We could probably show (as may be others have) that even when limiting the
degrees of freedom to maintain "statistical quality," you can always make an
a-theoretical equation that produces statistically impeccable, but causally
meaningless, results. I am arguing that we need statistics to keep us
honest, and we need TESTED hypotheses to maintain any sense of casual
meaning. We need the "why -- that causal meaning -- without which we cease
to be system dynamists.
Lastly, Jim Hines is right to ask for data on statistics versus stories.
Here is a small collection. As John notes, there are so many, it hard to
pick a few. I chose those that brought the point home to me. Certainly,
Alan Graham, a long history of PA Consulting folks (formerly Pugh Roberts),
Bob Eberlein, and a myriad of others have more relevant and numerous
examples.
In the late 19702 we had the oil crises. Roger Naill and I were visiting
oil companies to determine how they invested in new energy projects. The
CFO and comptrollers invariably explained that they used "hurdle" rates (the
then in-vogue management fashion). There would be no investment unless the
project would earn 18% ROI. Other than during the actual crises, the
companies would earn between 7% and 12% (using the same accounting procedure
as used for the "hurdle rate.") What the data indicated was that investment
as driven by growth expectations derived from departments asking for budgets
to maintain their importance and "turf." The budgets they felt they could
request depended on the past ROI. The projects were culled and selected by
an executive committee. The results looked (as expected) like a
Qualitative Choice Theory response (Daniel McFadden, 2000 Nobel Prize in
Economics). When faced with this data and the model results, the oil
companies agreed that this made much more sense and were glad we put a myth
(they always suspected was false.) to rest.
In the electric industry of the 1970s there was only growth on the horizon.
"Build them there nukes as fast as you can." Up until recently, the
electric industry remained one of the few purist optimization shops.
Everything is "perfect." The forecasters do a perfect job in estimating
future demands (using naive statistics) and the Linear programming models
showed the perfect capacity expansion plan. And that is what the utility
builds .except it doesnt. The board of directors (or the senior
management) have a fair bit of trepidation in putting a $billion or so
on-the-line due to results from obtuse techniques they really dont
understand. They use their "experience and wisdom" to guide the final
decision. As John Sterman showed, this experience is exponential smoothing
of actual growth rate information (see figure on page 641 of Johns business
dynamics). All the planning departments below the board had little impact
on final decisions. They were all shocked. The management realization was
to go to the other extreme and get rid of forecasting and the planning
departments. Few electric utilities now have such departments.
As another minor utility "story," there is the problem of having an
integrated model with the physical and financial dynamics as well as the
competitors and consumers. In our case, we initially made the dumb
assumption that revenues equal sales multiplied by price plus other income.
That works great if you look at the financial books, or the regulatory books
or the production books. But each used different prices and different
sales. There is often a long delay between the time a utility produces the
energy and receives the money. There are convolutions as to when it counts
the money in a regulatory, tax, or a financial sense. When integrated in a
model, with lots of prices and delayed sales, the picture of how a utility
works looks much different, and the hitherto unexplainable fluctuations in
the income of a regulated utility makes sense from the internal processes
rather then from the conventional blaming of the regulatory regime. Every
body in the utility had their local story from their perspective. They
could not understand the feedback loops without a model any more than we
could. They were further blinded by their nearsightedness. The stories
(myths) fit their needs, not reality. Their stories did give us hypotheses
to test and refute. We then had more confidence in the hypotheses that
survived.
In a high-tech setting we actually did the classic group modeling. (I
usually use a strawman-model approach to take advantage of the ease with
which people can refute as opposed to make hypotheses.) Their stories fit
well together. If there were discrepancies, the "higher" up departments
"corrected" the story and all agreed with a "oh ya" thats how it must
work." All VPs and the CEO were 100% approving of this representation for
the company. They knew! Among a host of refuted myths, one was that
production was based on orders. The (simple) statistics found little
correlation, even though the eye wanted to see one. After questioning the
production line staff and field sales-staff, we discovered that the company
liked to roll out new products at the end of the year. The customers knew
they could wait until the end of the year for deep discounts. Marketing
knew it needed hype for the roll out to make customers eager to get the new
units and that they need "salesman incentives" to keep the sales staff
pushing during the mid year fall-off. Many of these sales motivations would
give incentive for sales staff to just get the product on the customers
sites (to be returned later). Thus "sales" and the sales forecast has
little to do with what the production line really needed to produce. The
production line simply followed the inventory. They threw out the orders
and the forecasts. They kept the company afloat. Everybody in the "model
development group" - all managers - suffered group delusion.
As a side note to the high-tech story, the company collected lots of data.
Data overload city. Staff could only tolerate the quarterly summaries,
because they could not deal with the daily (hourly) fluctuations in the
data. Humans have an overpowering urge to see patterns (even ones that are
not there). The quarterly data balanced the books. There was great comfort
in the quarterly data, but the critical time constants of the system where
on the order of 2 to 6 weeks in many cases. The sales cycle combined with
these time constants meant that internal statistical analysis never made
even casual sense, but they were used anyway. We could show via the model
how these wrongly interpreted data could be produced and what variables had
to be sampled more frequently to allow meaningful decisions. The actual
point here is that bad data is still data. Understanding why it is bad or
how the dynamics shown could be produced, directs our search for the most
valid (most confidence) representation. Unless it is incompetently measured
data, I argue all data has information that can be used constructively - for
refutation of hypotheses.
Lastly (before I fill a book), the US Department of energy collects energy
data. It is actual relatively easy to collect data on a region level by
looking at in-flow and out-flows over a year. Inventory noise tends to
balance out. It gets tough when there are small states in a region. You
cant afford to sample it more when the impact is minimal to region or to
the more critical big states. The typical approach is to accept the small
state data and subtract out their sum from the regional value to obtain the
number for the big state. It is big and the "small" state errors wont have
much of an effect. When we modeled the region (New England), the small
state data "held together" but Massachusetts showed historical dynamics the
model could not produce. The dynamics were significant and would have big
implications for policy. There were few stories that even tried to explain
the data. We could make hypotheses for each data point. There was no
generalizable causal theory. We could no believe that history was dominated
by a multitude of one-time random events. We went back and smoothed the
data to estimate parameters. We then ran all the states and produced a new
history consistent with the regional total. All the small states made sense
within the error bounds of the collected statistics while Massachusetts now
made sense. A follow-up (small) survey indicated that the data the model
produced was more in-line with what the actual state data would indicate.
Thus, the model, to our and the clients satisfaction, showed the data to be
wrong but "correctable." Was this valid? The model and the data supported
the adjustments - and the model results and data could be understood. There
was a high degree of confidence in the model and the model was useful. We
were most assuredly "wrong." By using casual hypotheses and data, we could
bound how wrong we were and how confident we could be.
In summary, stories and anecdotes are not data. We all make up stories to
fit our needs. They are (ill-informed) mental models and ALL models are
wrong. Stories are a valid source of hypotheses that we can test - but have
no reason to believe a priori. Any meaningful data is better than no data.
Without data, there is no ability to have any confidence in the hypothesis.
We must run the model and use statistics (data). The process produces bad
results, but they are better than any others are.
George
George Backus
Policy Assessment Corporation
14602 West 62nd Place
Arvada, CO 80004
Bus: 303-467-3566
Fax: 303-467-3576
Email:
George_Backus@ENERGY2020.com