Calibration with "soft" data

fabioR · Post by **fabioR** » Thu Feb 22, 2018 1:33 pm

Good afternoon everybody,

I am writing in order to kindly ask you for a help regarding the calibration with Vensim DSS using "soft" sources of data. In order to be clear, I upload the file that I will use to explain better my question. In the file, you can see a simple Bass model that represents the evolution of "Customers" as a function of the "contact rate" and the "adoption fraction". In turn, I modelled the adoption fraction as a function of the variable "Income", that grows with time.

I also uploaded the excel file "Data", where I supposed to have a precise historical series of values for the variable "Customers", and some "soft" data values for the variable "Income" given by the experts during an interview. These "soft" data are in the form of two possible ranges of values at time 0, and at the final time 400.

I would like to make a calibration of the parameters "contact rate", "specific adoption fraction", "Initial Income", and "fractional increase in income".

It is clear to me how to create the optimisation file and the payoff definition by using the historical series of data referring to the variable "Customers"; my question is: is it possible to set the payoff also for the "Income" variable by exploiting somehow the range of soft data values given by the experts?

Thank you very much!

I look forward to hearing from you,

Fabio

Post by **Administrator** » Thu Feb 22, 2018 1:38 pm

It is clear to me how to create the optimisation file and the payoff definition by using the historical series of data referring to the variable "Customers"; my question is: is it possible to set the payoff also for the "Income" variable by exploiting somehow the range of soft data values given by the experts?

I cannot think of an easy way to incorporate a range. All you can really do is calibrate to the mid point of the range (assuming the mid point is the correct one to use).

Tom might have some ideas though (I'm sure he'll be along soon to help).

Post by **tomfid** » Thu Feb 22, 2018 3:58 pm

I can think of two options:

1. Create some dummy data in your income column (I think you'll need to remove the NA values, as it won't read text). For example, if the range is 50 to 70, you could place a 60 at a couple points (or maybe just the end).

Then the question is how to weight customers vs. income. This is easier to think about if you use the Gaussian distribution option rather than the Normal default. (For Normal, the weight parameter is the inverse of the standard deviation, whereas for Gaussian you specify the std dev directly.) If the customer data is precise, you'd specify a small SD (say, 1). Since income is imprecise, you'd use a large value (say, 10). Another thing to think about is whether the numerous customer data points are really independent; if not, you might want to lower the weight (raise the SD) to compensate for that.

The drawback of this approach is that none of the distribution options really captures the idea that "50 to 70 is OK, but 71 is not." On the other hand, people are rarely precise in their statements about probability. So, a range of 50 to 70 probably means that 60 is the best guess, with a standard deviation of perhaps 5, leaving a residual 5% probability outside the stated range.

2. If you want more flexibility, you could avoid the calibration infrastructure, and instead create a penalty function expressing your prior on income. Something like:

income log likelihood ratio - make this a lookup representing the probability density for income; it's easier to use a log likelihood because they're additive. This will be 0 (log(1)) at the peak (60, or maybe anywhere from 50 to 70) and negative elsewhere. If you think anything outside 50 to 70 is really unlikely, it could be hugely negative, like -1e6.

Then:
income likelihood = income log likelihood ratio( income )*weight

weight is just a scaling parameter, and you might make it time-varying, so that it only applies near the end (for example).

Then you'd have two entries in your payoff. Customers would be a Calibration element (the usual way), and the income likelihood would appear as a Policy element with weight 1.

fabioR · Post by **fabioR** » Thu Feb 22, 2018 5:17 pm

tomfid wrote: (I think you'll need to remove the NA values, as it won't read text)

Thank you Tom. You have been very helpful! I tried to implement both of them. Since for "income" I have just the initial and final data values, I had to increase a lot the weight of "Income" payoff, otherwise its effect on the overall payoff function is minimal if compared to all the payoff contributions of the variable "Customers", whose I have the historical data.

Just one thing: why do you suggest me to remove NA? I usually put NA where I have missing data. For example, imagine that I have some missing data in the historical series of "Customers". If I put NA, Vensim should "see" the missing data as "holes", and it shouldn't calculate the payoff values on that "holes", right?

Thanks a lot,

Fabio

Post by **tomfid** » Thu Feb 22, 2018 7:58 pm

It's been a while since I tried it, but I think Vensim just ignores a text NA. Not sure what happens if you use the Excel =NA() instead. I usually just leave the missing points blank, but if NA is working, fine.

I'm not surprised that you had to increase the weight on income; it really boils down to how many data points you think your informal constraint is worth. Since you have a lot of not-really-independent consumer data points, income needs a high weight to compete.

Glad it's working!

Ventana software support forum

Calibration with "soft" data

Calibration with "soft" data

Re: Calibration with "soft" data

Re: Calibration with "soft" data

Re: Calibration with "soft" data

Re: Calibration with "soft" data