Message Boards Message Boards

Working with lists - comparing data in lists of unequal length.

Posted 8 years ago

I have collected some experimental data at 544 frequencies, and have a computer model for that data. I have created a list which has the differences between the experimentally measured data and the model. They are not really errors, as neither the model or experimental data are perfect, but to make the discussion easier I will call them errors.

The first item in the list is frequency, and the second a phase error. So at a frequency of 200 MHz the phase errors is 0.092593 degrees, whereas at 7000 MHz, the phase error is 1.23048 degrees.

phaseErrors={{200.`, 0.0925929735343855`}, {212.5`, 0.06536204885470609`}, {225.`,
   0.0476567575293923`}, {237.5`, 0.07512708968635051`}, {250.`, 
  0.06988303494777459`}, {262.5`, 0.08247458243017602`}, {275.`, 
  0.06723172074437223`}, {287.5`, 0.07860443799551908`}, {300.`, 
  0.06802272178311031`}, {312.5`, 0.07794655920100102`}, {325.`, 
  0.13718593
    <snip out rest of list> 
    {6875., 1.19548}, {6887.5, 1.16039}, {6900., 1.13316}, {6912.5, 
      1.12165}, {6925., 1.12583}, {6937.5, 1.16913}, {6950., 
      1.18908}, {6962.5, 1.20127}, {6975., 1.21354}, {6987.5, 
      1.22588}, {7000., 1.23048}}

I want to see if the phase errors exceed some thresholds. But there are only 7 thresholds which each cover a span of 1000 MHz, which is a lot larger than the 12.5 MHz between data points.

I want to see if all data between 0 and 1000 MHz has an error of less than 0.6 degrees, all data >1000 MHz but <= 2000 MHz has an error of less than 0.78 degrees, all data >2000 and <=3000 MHz has an error of less than 1.24 degrees .... etc. It is likely the permissible errors will always increase with frequency, but I don't want to make that assumption.

permissableErrors = {{1000.0, 0.6}, {2000.0, 0.78}, {3000.0, 
1.24}, {4000.0, 1.69}, {5000.0, 2.31}, {6000.0, 3.05}, {7000.0, 
3.64}};

Can anyone suggest a few decent ways to check if the errors in the first list exceed the permissible errors in the second list? I can think of several very messy ways to do it, with lots of IF statements, which would be practical when there are only 7 ranges of permissible errors, but would be impractical if that list was 50.

I'm guessing I need to create a big list of 544 element with permissible errors at each frequency, then subtract from that the actual errors, and see if the minimum value in the resultant list is less than 0, which would indicate one of the errors was greater than the permissible error. In other words, something like

biglist={{200,0.06}, {212.5,0.6}, {225,0.6} ...{6987.5,3.64},{7000.0,3.64};
minimum=Min[biglist-phaseErrors];
If[minimum < 0, fail=1,fail=0];

But maybe there are better ways, and in any case I don't know a neat way of creating "biglist".

Dave

Attachments:
POSTED BY: David Kirkby
17 Replies
Posted 8 years ago

Thank you.

It's late here and my wife has just called me, so I can't look over your code much. But I don't want to interpolate like this, as I want to see if the errors are within specific limits. An error of 0.77 degrees at 1000.01 MHz would be acceptable, whereas your interpolation would find a value very close to 0.6 degrees at this frequency, as its 0.6 degrees at 1000 MHz and 0.78 degrees at 2000 MHz.

I'm basically looking for a pass/fail solution, that can be used in a sort of batch-mode. I will measure 20 devices, saves the files as 1.s1p, 2.s1p, 3.s1p etc, then just run the code to see what (if any) devices fail to meet the specification.

Dave

POSTED BY: David Kirkby

Hi David,

I could not imagine that tolerances come in step functions - sorry! OK, maybe you like this solution better: I still do an interpolation of your permissableErrors (where I added {0,0} as first element), but with the option InterpolationOrder -> 0. So the code looks almost identical as above:

ClearAll["Global`*"]

permissableErrors = {{0, 0}, {1000.0, 0.6}, {2000.0, 0.78}, {3000.0, 
    1.24}, {4000.0, 1.69}, {5000.0, 2.31}, {6000.0, 3.05}, {7000.0, 
    3.64}};
permEfunc = Interpolation[permissableErrors, InterpolationOrder -> 0];

SetDirectory[NotebookDirectory[]];
phaseErrors = Import["measured-errors.csv"];

Show[ListLinePlot[phaseErrors, Filling -> Axis],
 ListPlot[permissableErrors, PlotStyle -> Red],
 Plot[permEfunc[x], {x, 0, 7000}, PlotStyle -> {Red, Dotted}], 
 PlotRange -> {0, permissableErrors[[-1, -1]]}, ImageSize -> Large, 
 GridLines -> Automatic]

which now gives:

enter image description here

With the help of this function you can check the whole list of phaseErrors like so:

And @@ (permEfunc[#1] > #2 & @@@ phaseErrors)
(* Out:   True *)

Regards -- Henrik

POSTED BY: Henrik Schachner

Hi Henrik, How do I fill the missing data for the 11th and 12th months of 1972 and 1973 for Japan and also the 12th month of 1972 for Korea?

In[1]:= data0 = <|"Japan" -> <|1971 -> {17.3`, 91.4`, 36.5`, 127.3`, 
       6.8`, 15.6`, 0.6`, 5.4`, 11.8`, 1.6`, 60.1`, 64.`}, 
     1972 -> {19.8`, 87.3`, 29.7`, 36.2`, 36.2`, 26.5`, 8.2`, 9.3`, 
       37.7`, 28.5`}, 
     1973 -> {25.8`, 23.4`, 38.2`, 86.3`, 28.4`, 34.1`, 10.2`, 18.2`, 
       65.`, 78.3`}|>, 
   "Korea" -> <|1971 -> {7.`, 82.3`, 171.5`, 223.9`, 9.3`, 0.4`, 5.`, 
       0.9`, 0.`, 19.4`, 81.6`, 49.5`}, 
     1972 -> {61.3`, 34.9`, 69.5`, 146.1`, 92.7`, 51.1`, 0.`, 0.`, 
       16.7`, 28.3`, 40.8`}|>|>;

In[2]:= data0 // MapAt[Length, #, {All, All}] &

Out[2]= <|"Japan" -> <|1971 -> 12, 1972 -> 10, 1973 -> 10|>, 
 "Korea" -> <|1971 -> 12, 1972 -> 11|>|>
POSTED BY: M.A. Ghorbani
Posted 4 years ago

Hi Mohammad,

data0 // Map[(PadRight[#, 12, Missing["NotAvailable"]] &), #, {2}] &

I will post an answer to your previous question sometime later today.

POSTED BY: Rohit Namjoshi

Hi David,

I want to see [my emphasis] if all data between ...

Then how about interpolating your permissableErrors and simply making a plot?

ClearAll["Global`*"]

permissableErrors = {{1000.0, 0.6}, {2000.0, 0.78}, {3000.0, 
    1.24}, {4000.0, 1.69}, {5000.0, 2.31}, {6000.0, 3.05}, {7000.0, 
    3.64}};
permEfunc = Interpolation[permissableErrors];

SetDirectory[NotebookDirectory[]];
phaseErrors = Import["measured-errors.csv"];

Show[ListLinePlot[phaseErrors, Filling -> Axis],
 ListPlot[permissableErrors, PlotStyle -> Red],
 Plot[permEfunc[x], {x, 1000, 7000}, PlotStyle -> {Red, Dotted}], 
 PlotRange -> {0, permissableErrors[[-1, -1]]}, ImageSize -> Large, GridLines -> Automatic]

giving:

enter image description here

Regards -- Henrik

POSTED BY: Henrik Schachner

Hi Henrik,

I got unequal lengths ; {32, 24, 48}; for {Japan, Korea, India} in the data0. I there a shorter way to determine the minimum length value? Also, I would like to plot the records based on the minimum length ( 24 ) with a simple and shorter solution way?

data0 = <|"Japan" -> <|1971 -> {17.3`, 91.4`, 36.5`, 127.3`, 6.8`, 
       15.6`, 0.6`, 5.4`, 11.8`, 1.6`, 60.1`, 64.`}, 
     1972 -> {19.8`, 87.3`, 29.7`, 36.2`, 36.2`, 26.5`, 8.2`, 9.3`, 
       37.7`, 28.5`}, 
     1973 -> {25.8`, 23.4`, 38.2`, 86.3`, 28.4`, 34.1`, 10.2`, 18.2`, 
       65.`, 78.3`}|>, 
   "Korea" -> <|1971 -> {7.`, 82.3`, 171.5`, 223.9`, 9.3`, 0.4`, 5.`, 
       0.9`, 0.`, 19.4`, 81.6`, 49.5`}, 
     1972 -> {61.3`, 34.9`, 69.5`, 146.1`, 92.7`, 51.1`, 0.`, 0.`, 
       16.7`, 28.3`, 40.8`, 0.3`}|>, 
   "India" -> <|1971 -> {48.7`, 97.8`, 115.8`, 56.7`, 51.1`, 34.6`, 
       90.9`, 53.1`, 84.9`, 208.2`, 117.`, 233.1`}, 
     1972 -> {58.6`, 38.4`, 36.4`, 79.8`, 39.9`, 210.3`, 34.9`, 
       115.6`, 119.1`, 175.6`, 106.5`, 48.8`}, 
     1973 -> {69.5`, 79.7`, 116.7`, 68.1`, 93.1`, 99.7`, 32.2`, 30.7`,
        12.5`, 167.6`, 359.8`, 185.8`}, 
     1974 -> {52.5`, 64.6`, 52.7`, 53.8`, 55.`, 19.5`, 39.`, 97.1`, 
       46.4`, 97.9`, 170.6`, 217.4`}|>|>;

w1 = Normal /@ (Normal@data0["Japan"])

{1971 -> {17.3, 91.4, 36.5, 127.3, 6.8, 15.6, 0.6, 5.4, 11.8, 1.6, 
   60.1, 64.}, 
 1972 -> {19.8, 87.3, 29.7, 36.2, 36.2, 26.5, 8.2, 9.3, 37.7, 28.5}, 
 1973 -> {25.8, 23.4, 38.2, 86.3, 28.4, 34.1, 10.2, 18.2, 65., 78.3}}

Japan = Flatten[Part[#, 2] & /@ w1]

{17.3, 91.4, 36.5, 127.3, 6.8, 15.6, 0.6, 5.4, 11.8, 1.6, 60.1, 64., \
19.8, 87.3, 29.7, 36.2, 36.2, 26.5, 8.2, 9.3, 37.7, 28.5, 25.8, 23.4, \
38.2, 86.3, 28.4, 34.1, 10.2, 18.2, 65., 78.3}

w2 = Normal /@ (Normal@data0["Korea"])

{1971 -> {7., 82.3, 171.5, 223.9, 9.3, 0.4, 5., 0.9, 0., 19.4, 81.6, 
   49.5}, 1972 -> {61.3, 34.9, 69.5, 146.1, 92.7, 51.1, 0., 0., 16.7, 
   28.3, 40.8, 0.3}}

Korea = Flatten[Part[#, 2] & /@ w2]

{7., 82.3, 171.5, 223.9, 9.3, 0.4, 5., 0.9, 0., 19.4, 81.6, 49.5, \
61.3, 34.9, 69.5, 146.1, 92.7, 51.1, 0., 0., 16.7, 28.3, 40.8, 0.3}

w3 = Normal /@ (Normal@data0["India"])

{1971 -> {48.7, 97.8, 115.8, 56.7, 51.1, 34.6, 90.9, 53.1, 84.9, 
   208.2, 117., 233.1}, 
 1972 -> {58.6, 38.4, 36.4, 79.8, 39.9, 210.3, 34.9, 115.6, 119.1, 
   175.6, 106.5, 48.8}, 
 1973 -> {69.5, 79.7, 116.7, 68.1, 93.1, 99.7, 32.2, 30.7, 12.5, 
   167.6, 359.8, 185.8}, 
 1974 -> {52.5, 64.6, 52.7, 53.8, 55., 19.5, 39., 97.1, 46.4, 97.9, 
   170.6, 217.4}}

India = Flatten[Part[#, 2] & /@ w3]

{48.7, 97.8, 115.8, 56.7, 51.1, 34.6, 90.9, 53.1, 84.9, 208.2, 117., \
233.1, 58.6, 38.4, 36.4, 79.8, 39.9, 210.3, 34.9, 115.6, 119.1, \
175.6, 106.5, 48.8, 69.5, 79.7, 116.7, 68.1, 93.1, 99.7, 32.2, 30.7, \
12.5, 167.6, 359.8, 185.8, 52.5, 64.6, 52.7, 53.8, 55., 19.5, 39., \
97.1, 46.4, 97.9, 170.6, 217.4}

lengths = Length /@ {Japan, Korea, India}

{32, 24, 48}

ListLinePlot /@ {Japan, Korea, India}
POSTED BY: M.A. Ghorbani
Posted 4 years ago

Hi Mohammad,

Shorter way to determine the minimum length

lengths = data0 // Map[Values /* Flatten /* Join /* Length]
(* <|"Japan" -> 32, "Korea" -> 24, "India" -> 48|> *)

minimumLength = lengths // Values // Min
(* 24 *)

Not sure what you mean by

I would like to plot the records based on the minimum length ( 24 )

The years and points per year for which data is available is different by country.

data0 // MapAt[Length, #, {All, All}] &
(*
 <|"Japan" -> <|1971 -> 12, 1972 -> 10, 1973 -> 10|>, 
 "Korea" -> <|1971 -> 12, 1972 -> 12|>, 
 "India" -> <|1971 -> 12, 1972 -> 12, 1973 -> 12, 1974 -> 12|>|>
*)

What subset of the values are you trying to plot?

POSTED BY: Rohit Namjoshi

Thank you so much, Rohit.

I would like to 1) extract the records for each country separately; and then 2) Plot the records based on minimum length in a simple way like this:

ListLinePlot /@ {Take[Japan, 24], Take[India, 24], Korea}
POSTED BY: M.A. Ghorbani
Posted 4 years ago

Hi Mohammad,

Maybe something like

data0 // Map[Values /* Flatten /* Join /* (Take[#, 24] &)] // 
  KeyValueMap[
   ListLinePlot[#2, PlotLabel -> #1, ImageSize -> Medium] &] // Row

enter image description here

The problem with this approach is that you really cannot compare the three plots. For Japan the plot shows 12 values for 1971, 10 for 1972 and 2 for 1973. For Korea 12 for 1971 and 12 for 1972.

It would be much better if you had the corresponding month values. Then you can construct a TimeSeries for year/month data and use a DateListPlot to compare corresponding year/month values for the three countries.

POSTED BY: Rohit Namjoshi

Rohit,

You are right absolutely. In this case we have some missing data. Is there any way to fill these gaps for getting equal lengths for these countries? Regards.

POSTED BY: M.A. Ghorbani
Posted 4 years ago

Hi Mohammad,

Yes, if you know exactly which months are missing they can be replaced with Missing. If you do have the corresponding months, constructing a TimeSeries would be the best option. It can also interpolate missing values.

POSTED BY: Rohit Namjoshi
Posted 4 years ago

Working with lists - comparing data in lists of unequal length.

POSTED BY: Alex Teymouri

Hi Rohit,

Fortunately, I know exactly which months are missing in the data. Please see the enclosed file as my main data. We have a minimum 9 lengths (9 months) and a maximum of 12 lengths (12 months) for the years.

Length 9,10,11 and 12 means that we have records for {Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep}, {Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct}, {Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov} and all months, respectively.

After filling the missing data, we should have 12 elements for all stations.

Attachments:
POSTED BY: M.A. Ghorbani
Posted 4 years ago

Hi Mohammad,

It is odd that the missing values are always at the end of the year. Is there an explanation for that? I have attached an updated version of the notebook with some different ways to visualize the data. e.g.

enter image description here

enter image description here

Attachments:
POSTED BY: Rohit Namjoshi

Hi Rohit,

This is super and incredible! Thank you so much. In the timeSeriesData, we should have the completed data. In the adanaBolgeData we see the plot has some missing data. How do we fill these missing values?

Again I am thanking you for your help, kindness and time.

POSTED BY: M.A. Ghorbani
Posted 4 years ago

Hi Mohammad,

The original data is missing values for several years/months. In the notebook I filled them with Missing["NotAvailable"]. What do you want to fill the missing data with?

You could try TimeSeriesModelFit for the available values in a year and use it to extrapolate, however it is unlikely to produce good results on just 9 - 11 data points. You could try replacing the missing values for a month with the average for that month across all years. But, if you are looking at trends over the years then this is going to bias the analysis. Maybe the missing values should be taken into account when computing statistical significance. The best approach will depend on what you intend to use this data for.

POSTED BY: Updating Name

Thank you so much for your thoughtful solution way and sorry for the delay in response.

How do I replace the missing values for a month with the average for that month across all years? It is sufficient for me.

Thank you very much.

POSTED BY: M.A. Ghorbani
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract