Message Boards Message Boards

0
|
6172 Views
|
8 Replies
|
3 Total Likes
View groups...
Share
Share this post:

Combination Method - fitting two parameters to data

Dear All,

I am wondering about the principal of combination method. The combination formula is nCr to define the number of possible combinations of r objects from a set of n objects. So nCr = n!/(r!(n-r)!

But for my case, I have 2 unknowns which I want to obtain from several equations. For example, in total I have 287 equations. But I think I can not use the combination formula because in this case, r is 2 because I have 2 unknowns but n is not 287. Actually I am also looking for the best n itself.

In this attachment, a and b are 2 unknowns which I want to obtain and there are 6 equations. So I can get a and b by using those 6 equations. But I have 287 equations so I want to know the best combination, how many equations which I need to use to get the best a and b although the expected values of a and b are also unknown.

So is that possible to do so? If yes, what function should I use to solve it by using Mathematica? Thanks for your attention.

Best Regards, Intan

Attachments:
POSTED BY: INTAN SUPRABA
8 Replies

how many equations which I need to use to get the best a and b although the expected values of a and b are also unknown

How will one know they obtained the best a and b if even the expected values of a and b are not known?

POSTED BY: Nasser M. Abbasi

You really need to provide a (smallish) example that fully captures the problem at hand. From the current nb I cannot tell how your 'n' arises in the problem.

POSTED BY: Daniel Lichtblau

Dear Daniel,

Thanks for your comment. I really appreciate it. I will try to explain the overall problem clearly as follows:

  1. The basic equation: a b Sech[b R] = constant, where R is total rainfall (mm) and a & b are unknowns which I want to obtain.
  2. I have 250 nos of total rainfall so I have 250 values of R (please refer to attach."Raw Data.xlsx")
  3. In this case, I will have 250 equations with 2 unknows (a and b).
  4. I picked up randomly several total rainfall (R) and make few combinations based on different threshold such as: comb1 = R in the range of 50-150 mm comb2 = R in the range of 50-175 mm comb3 = R in the range of 50-200 mm comb4 = R in the range of 100-150 mm comb5 = R in the range of 100-175 mm comb6 = R in the range of 100-200 mm

Then for each combination, I got the variation values of a and b (please refer to attach."Combination.nb"). The number of R for each combination is still not based on the actual number of R for each threshold. Like for example, for comb1 I only use 6 nos of R when actually I have 105 nos of R. For trial, I picked up only 6 nos of R because I have not finish making equations for the overall 250 variations of R.

  1. The summary of those 6 different combinations is as follows:

Summary of obtained a and b from different threshold

  1. So I want to know what kind of combination method to be used to cover the full range of data set and if possible based on 2 conditions: 1) Depend on threshold for example 0-5 mm, 0-10 mm, 0-20 mm, 5-10 mm, 5-15 mm, .......0-550 mm. But the threshold itself is not determined. So any thresholds is to be used for making the combination. 2) Not depend on threshold, because even only by using a single equation I can get 2 unknowns.

               in= NMinimize[{Max[(a b Sech[128.3 b] - 0.3169)^2], a > 0}, {a,  b}]
               out= {1.2326*10^-32, {a -> 64.9373, b -> 0.0122389}}
    

Please kindly let me know if the above explanations are still not clear and I will try to further explain about the problem itself. Thanks in advance for your help!

Best Regards, Intan

Attachments:
POSTED BY: INTAN SUPRABA

Dear Nasser,

Thanks for your comment. The a and b are unknowns and my final goal is to find the best fit of a and b. I don't know what supposed to be the expected best estimation values of a and b, but I can know whether the obtained a and b are good or not by doing validation. For my study, those a and b are runoff parameters to estimate/ to simulate runoff.

I know the real value of runoff from the observation data (observed runoff). So if I have a pair of a and b, I can simulate the runoff and then overplotted the simulated runoff with the observed runoff. The smallest discrepancy (error) between the simulated runoff and observed runoff is the best.

So my plan is to obtain all of the possible estimation values of a and b by doing combination method or any kind of methods. After obtaining so many pairs of a and b, I am not considering to simulate runoff and to validate it one by one by using each pair of a and b.

Thus the next step is I need to think of another method to get a single value of a and b. For example, if I have 100 pairs of a and b, then I can plot a and b separately using those 100 values. Maybe I can plot the Gaussian distribution then I can obtain the mean value of a and b. But I am not sure. I dont know how the plotting result will look like.

I am still thinking what to do next after obtaining many pairs of a and b in order to get the best single values of a and b. So I am looking forward to hear further comments from you. Thanks for your attention.

Best Regards, Intan

POSTED BY: INTAN SUPRABA

(1) The given factors in the hyperbolic secants e.g. 128.3 and 136.6, do not correspond to actual values in the xlsx file. I don't know what is their significance in the expressions under consideration if they are not 'R' values from the data.

(2) There are different constant values being subtracted, e.g. 0.3169, 0.2162. So it's not clear how or where a "constant" value appears in the expressions.

All this is really very murky. From the broad description I would guess you are trying to do some form of resampling statistics. Can't really give any advice though, given how little I understand of your setup.

POSTED BY: Daniel Lichtblau

Dear Daniel,

Thanks for your comments and I would like to clarify as follows:

(1). Yes, there are some slight different between the raw data and the given R value in the hyperbolic secant. The real values of R is as per shown in the excel sheet. But sometimes I need to do some adjustments because the R value which I use in that basic equation is based on hourly data. Actually the purpose to show those excel data is only for showing the total number of R which is considered in the analysis. About the exact magnitude of R itself that I will use in the equation, I need to check it out one by one and need to do adjustment when necessary. So sorry for the confusion related to the discrepancies between the magnitude of R in the tanh equation and in the raw data.

(2). About the constant value for each equation, I obtained those values by doing a simulation for each rainfall event (R). So constant value for each equation varies because R is also different.

So my problem is after obtaining 250 equations where each equation contain 2 unknowns i.e. a and b, then what kind of combination method that I can do to get many pair combinations to produce a and b. I can use 1 equation to get a and b. I also can use 2 equations to get a and b. And I also can use overall 250 equations to get a and b.

I am looking for the appropriate combination method to do so and I heard about Monte-Carlo method but I am not sure how to do it in Mathematica. Hence, please kindly let me know if you know which function/command in Mathematica which adopted Monte-Carlo algorithms. Thanks for your attention.

Best Regards, Intan

POSTED BY: INTAN SUPRABA

A Monte-Carlo simulation could proceed as follows.

(1) Select a random subset of your equations. You might try different criteria e.g. a few at a time or perhaps 30% at a time. I do not know enough about this area to have a good suggestion here.

(2) Do a fitting to get the best-fit parametervalues for {a,b}.

(3) Attempt to combine your results or at least check that they tend to give consistent results.

Having said this, it's probably overkill. It is not at all clear why one would not just do a best fit to the full set of data. One reason to use subsets, in general, is to avoid overfitting when there is a plethora of parameters. But you only have two. In such situations I think it is fairly common to either use the full set, or at worst use a large subset and then "test" the result by checking how well it "predicts" the remaining cases.

The upshot is I think there are basic questions of the science and statistics involved that you will need to consider before you can really pose this as a coding question.

POSTED BY: Daniel Lichtblau

Dear Daniel,

Thanks for your comments. There is a physical meaning for using only small range data set so that's the reason why I am trying not to use the full range data set, if possible. However, thanks a lot for your comments. I am considering your ideas.

Best Regards, Intan

POSTED BY: INTAN SUPRABA
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract