Group Abstract

Message Boards

WOLFRAM COMMUNITY

7.6K Views

8 Replies

3 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Wolfram Language Optimization Statistics and Probability

Combination Method - fitting two parameters to data

INTAN SUPRABA

INTAN SUPRABA, Hokkaido University

Posted 11 years ago

Dear All, I am wondering about the principal of combination method. The combination formula is nCr to define the number of possible combinations of r objects from a set of n objects. So nCr = n!/(r!(n-r)! But for my case, I have 2 unknowns which I want to obtain from several equations. For example, in total I have 287 equations. But I think I can not use the combination formula because in this case, r is 2 because I have 2 unknowns but n is not 287. Actually I am also looking for the best n itself. In this attachment, a and b are 2 unknowns which I want to obtain and there are 6 equations. So I can get a and b by using those 6 equations. But I have 287 equations so I want to know the best combination, how many equations which I need to use to get the best a and b although the expected values of a and b are also unknown. So is that possible to do so? If yes, what function should I use to solve it by using Mathematica? Thanks for your attention. Best Regards, Intan Attachments:

POSTED BY: INTAN SUPRABA

8 Replies

Sort By:

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 11 years ago

A Monte-Carlo simulation could proceed as follows. (1) Select a random subset of your equations. You might try different criteria e.g. a few at a time or perhaps 30% at a time. I do not know enough about this area to have a good suggestion here. (2) Do a fitting to get the best-fit parametervalues for {a,b}. (3) Attempt to combine your results or at least check that they tend to give consistent results. Having said this, it's probably overkill. It is not at all clear why one would not just do a best fit to the full set of data. One reason to use subsets, in general, is to avoid overfitting when there is a plethora of parameters. But you only have two. In such situations I think it is fairly common to either use the full set, or at worst use a large subset and then "test" the result by checking how well it "predicts" the remaining cases. The upshot is I think there are basic questions of the science and statistics involved that you will need to consider before you can really pose this as a coding question.

POSTED BY: Daniel Lichtblau

INTAN SUPRABA

INTAN SUPRABA, Hokkaido University

Posted 11 years ago

Dear Daniel, Thanks for your comments. There is a physical meaning for using only small range data set so that's the reason why I am trying not to use the full range data set, if possible. However, thanks a lot for your comments. I am considering your ideas. Best Regards, Intan

POSTED BY: INTAN SUPRABA

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 11 years ago

(1) The given factors in the hyperbolic secants e.g. 128.3 and 136.6, do not correspond to actual values in the xlsx file. I don't know what is their significance in the expressions under consideration if they are not 'R' values from the data. (2) There are different constant values being subtracted, e.g. 0.3169, 0.2162. So it's not clear how or where a "constant" value appears in the expressions. All this is really very murky. From the broad description I would guess you are trying to do some form of resampling statistics. Can't really give any advice though, given how little I understand of your setup.

POSTED BY: Daniel Lichtblau

INTAN SUPRABA

INTAN SUPRABA, Hokkaido University

Posted 11 years ago

Dear Daniel, Thanks for your comments and I would like to clarify as follows: (1). Yes, there are some slight different between the raw data and the given R value in the hyperbolic secant. The real values of R is as per shown in the excel sheet. But sometimes I need to do some adjustments because the R value which I use in that basic equation is based on hourly data. Actually the purpose to show those excel data is only for showing the total number of R which is considered in the analysis. About the exact magnitude of R itself that I will use in the equation, I need to check it out one by one and need to do adjustment when necessary. So sorry for the confusion related to the discrepancies between the magnitude of R in the tanh equation and in the raw data. (2). About the constant value for each equation, I obtained those values by doing a simulation for each rainfall event (R). So constant value for each equation varies because R is also different. So my problem is after obtaining 250 equations where each equation contain 2 unknowns i.e. a and b, then what kind of combination method that I can do to get many pair combinations to produce a and b. I can use 1 equation to get a and b. I also can use 2 equations to get a and b. And I also can use overall 250 equations to get a and b. I am looking for the appropriate combination method to do so and I heard about Monte-Carlo method but I am not sure how to do it in Mathematica. Hence, please kindly let me know if you know which function/command in Mathematica which adopted Monte-Carlo algorithms. Thanks for your attention. Best Regards, Intan

POSTED BY: INTAN SUPRABA

INTAN SUPRABA

INTAN SUPRABA, Hokkaido University

Posted 11 years ago

Dear Nasser, Thanks for your comment. The a and b are unknowns and my final goal is to find the best fit of a and b. I don't know what supposed to be the expected best estimation values of a and b, but I can know whether the obtained a and b are good or not by doing validation. For my study, those a and b are runoff parameters to estimate/ to simulate runoff. I know the real value of runoff from the observation data (observed runoff). So if I have a pair of a and b, I can simulate the runoff and then overplotted the simulated runoff with the observed runoff. The smallest discrepancy (error) between the simulated runoff and observed runoff is the best. So my plan is to obtain all of the possible estimation values of a and b by doing combination method or any kind of methods. After obtaining so many pairs of a and b, I am not considering to simulate runoff and to validate it one by one by using each pair of a and b. Thus the next step is I need to think of another method to get a single value of a and b. For example, if I have 100 pairs of a and b, then I can plot a and b separately using those 100 values. Maybe I can plot the Gaussian distribution then I can obtain the mean value of a and b. But I am not sure. I dont know how the plotting result will look like. I am still thinking what to do next after obtaining many pairs of a and b in order to get the best single values of a and b. So I am looking forward to hear further comments from you. Thanks for your attention. Best Regards, Intan

POSTED BY: INTAN SUPRABA

INTAN SUPRABA

INTAN SUPRABA, Hokkaido University

Posted 11 years ago

Attachments:

POSTED BY: INTAN SUPRABA

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 11 years ago

You really need to provide a (smallish) example that fully captures the problem at hand. From the current nb I cannot tell how your 'n' arises in the problem.

POSTED BY: Daniel Lichtblau

Nasser M. Abbasi

Nasser M. Abbasi, student

Posted 11 years ago

how many equations which I need to use to get the best a and b although the expected values of a and b are also unknown How will one know they obtained the `best a and b` if even the expected values of `a and b` are not known?

POSTED BY: Nasser M. Abbasi

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback