Tuesday, November 30, 2010

A Prize for Modeling a Business with Constraints: Selling from Novosibirsk

Sometimes, the model that is given to you has to be tweaked radically in order to explain the situation at hand. With this thought in mind, Bernard Beauzamy (the owner of SCM SA) has set up a 500 euros prize for whoever can find a way for the following business model to work (see below). The solutions should be sent before Friday December 3rd, 2010, 5 pm (Paris local time, that's GMT+1) to scm.sa@orange.fr.
Selling from Novosibirsk
A factory in Novosibirsk produces gloves. Each pair costs 40 rubles to produce, including everything : raw material, salary of workers, machines, transportation, and so on.
They produce 10,000 pairs and they want to sell them in a country where customs duties are 4/5 of the selling price. They cannot sell at a price higher than 200 rubles each pair, because of local competition and local buying power (at a higher price, nobody would buy). How do they manage to make a profit, and how much do they gain ?
One cannot cheat with the customs and corruption is forbidden. The price declared for the sale must be the true price. The exchange rate between currencies is assumed to be fixed and is not to be taken into account : everything is stated in the seller's currency (here the ruble). The solution should work repeatedly, any number of times, without violating any law, between any countries with normal customs.
Prize offered : 500 Euros, for the best answer received before Friday, December 3rd, 2010, 5 pm (Paris local time). Send answers to scm.sa@orange.fr. Answers may be written in English, French, Russian.

[Check the solution here]

Saturday, November 27, 2010

Optimal according to what ?

Redefining optimal is a blog entry by some of the folks at the Department of Systems Biology at Harvard Medical School. It is very nicely written and includes some nice comments. The entry specifically points to a paper by Fernández Slezak D, Suárez C, Cecchi GA, Marshall G, & Stolovitzky G (2010) entitled When the optimal is not the best: parameter estimation in complex biological models (PloS one, 5 (10) PMID: 21049094). The abstract and conclusions read:


BACKGROUND: The vast computational resources that became available during the past decade enabled the development and simulation of increasingly complex mathematical models of cancer growth. These models typically involve many free parameters whose determination is a substantial obstacle to model development. Direct measurement of biochemical parameters in vivo is often difficult and sometimes impracticable, while fitting them under data-poor conditions may result in biologically implausible values.

RESULTS: We discuss different methodological approaches to estimate parameters in complex biological models. We make use of the high computational power of the Blue Gene technology to perform an extensive study of the parameter space in a model of avascular tumor growth. We explicitly show that the landscape of the cost function used to optimize the model to the data has a very rugged surface in parameter space. This cost function has many local minima with unrealistic solutions, including the global minimum corresponding to the best fit.

CONCLUSIONS: The case studied in this paper shows one example in which model parameters that optimally fit the data are not necessarily the best ones from a biological point of view. To avoid force-fitting a model to a dataset, we propose that the best model parameters should be found by choosing, among suboptimal parameters, those that match criteria other than the ones used to fit the model. We also conclude that the model, data and optimization approach form a new complex system and point to the need of a theory that addresses this problem more generally.

Evidently, the post would have some relevance to compressive sensing if the model were to be linear, which it is not in this case.

Friday, November 19, 2010

The Modeler's Known Unknowns and Unknown Knowns

In Satyajit Das's Blog entitled "Fear & Loathing in Financial Products" one can read the following entry entitled WMD or what are derivatives 
....During the Iraqi conflict, Donald Rumsfeld, the US Defense Secretary, inadvertently stated a framework for understanding the modern world (12 February 2002 Department of Defense News Briefing). The framework perfectly fits the derivatives business. There were “known knowns” – these were things that you knew you knew. There were “known unknowns” – these were things that you knew you did not know. Then, there were “unknown knowns” – things that you did not know you knew. Finally, there were “unknown unknowns” – things that you did not know you did not know...
Then Satyajit goes on to clarify this term a little further:
....In most businesses, the nature of the product is a known known. We do not spend a lot of time debating the use of or our need for a pair of shoes. We also understand our choices – lace up or slip-on, black or brown. I speak, of course, of men’s shoes here. Women’s shoes, well, they are closer to derivatives. Derivatives are more complex. You may not know that you need the product until you saw it – an unknown known. You probably haven’t got the faintest idea of what a double knockout currency option with rebate is or does – a known unknown. What should you pay for this particular item? Definitely, unknown unknown. Derivatives are similar to a Manolo Blahnik or Jimmy Choo pair of women’s shoes....
There is also a word for the last one: Unk-Unk 
n. especially in engineering, something, such as a problem, that has not been and could not have been imagined or anticipated; an unknown unknown.
but I am not sure it fits with the previous definition. Looking up wikipedia, we have:
In epistemology and decision theory, the term unknown unknown refers to circumstances or outcomes that were not conceived of by an observer at a given point in time. The meaning of the term becomes more clear when it is contrasted with the known unknown, which refers to circumstances or outcomes that are known to be possible, but it is unknown whether or not they will be realized. The term is used in project planning and decision analysis to explain that any model of the future can only be informed by information that is currently available to the observer and, as such, faces substantial limitations and unknown risk.

How are these notions applicable to Robust Mathematical Modeling ?  John Cook reminded me recently of the Titanic effect presented initially in Jerry Weinberg's  Secrets of Consulting: A Guide to Giving and Getting Advice Successfully 
The thought that disaster is impossible often leads to an unthinkable disaster.
When modeling a complex  reality, we always strive to make the problem simple and then expect to build a more complex and realistic idealization of that problem. But in the end, even the most complex model is still an idealization of sorts. So every time I get to read about a disaster, grounded in some engineering mistake, I always wonder which part was the known unknown, the unknown known and the unknown unknown.

One of these moment that left me thinking happened back in February 2003 during last flight of the Space Shuttle Columbia. As you know, the spacecraft disintegrated over Texas. It did so because a piece of foam had hit one of its wings fifteen days earlier at launch. That explanation relied on a footage of the launch showing a piece of foam slowly falling from the main booster onto the edge of the wing of the Orbiter. Fifteen days later, when the Orbiter came back to land in Florida, it had a large hole that enabled air at a speed of Mach 17 to enter the inside of the wing, damaging it and thereby destroying the spacecraft. The remains of our experiment showed the temperature had reached well over 600C for a long period of time.

I was in the room next to the MCC during STS-107 as we were flying an instrument on top of the orbiter. We watched the take-off, we listened to all the conversations in the com loop between MCC and the astronauts during the fifteen days it flew (we asked some of the astronauts to manipulate our instrument at one point). At no time was there a hint of a possible mishap. We learned afterwards that even engineers, in the room next door and who had doubts, had requested  imagery from spy sats (but management canceled that request). However what was the most revealing after the tragedy was that I specifically recall that nobody around me could conceive the foam could have been making that much damage. None of us thought the speed differential between the foam and the spacecraft at launch could be that large. According to a simple computation based on the video of the launch, a half pound piece of foam hit the leading edge of the Orbiter's wing at an estimated speed of  924 fps or 1060 km/hr. It's always the square of the velocity that kills you.

More photos of the test performed in San Antonio on July 7, 2003 to recreate what happened can be found here.

As one can see from the small movie above, all previous tests had been performed with small pieces of foam. Further, attention had always been focused on the tiles underneath the Orbiter - never on the leading edge of the wings-. The video of the test performed in San Antonio, several months later, had everyone gasping in horror when the piece of foam opened a large hole in the leading edge of the wing. The most aggravating part of the story is that the Columbia flew for fifteen days with very little care about this issue while the Shuttle program had already seen take-offs with several near dangerous hits in the past.

On an unrelated issue, Ed Tufte also pointed out very clearly how the communication style using Powerpoint was a point of failure in the decision making process of deciding whether the foam strike  was a problem or not.. 

Of note, the communication to managment did not clearly delineate that there was absolutely no experience with such a large chunk of foam (experiments had been performed with 3 inch cube "bullets" vs an actual impactor with a volume of more than 1920 inch cube).

What were the known unknowns, the unknown knowns and the unknown unknowns in this instance ? First let me reframe all the categories of knowns/unknowns:for a modeler of reality or the engineers: With the help of wikipedia, let me summarize them as follows: To a modeler
  • the known known refers to circumstances or outcomes that are known to be possible, it is known that they will be realized with some probability (based on adequate experiments/data)
  • the known unknown refers to circumstances or outcomes that are known to be possible, but it is unknown whether or not they will be realized (no data or very low probability/extreme events).
  • the unknown unknown refers to circumstances or outcomes that were not conceived of by a modeler at a given point in time.
  • the unknown known refers to circumstances or outcome a modeler intentionally refuse to acknowledge that he/she knows
Looking back at the Columbia mishap, how can we categorize the different mistakes made:
  • the foam hitting the RCC was a unknown known to the modelers. People had done their homework and knew that:
    • falling pieces were hitting the orbiter at every launch
    • some launches had bad tile destruction because of falling pieces
    • The engineers went through a series of tests that they eventually put in a database. Most past foam hits fell in the known knowns, as pieces of foam were clearly fitting dimensions used in the database. They knew foam could fall on the RCC instead of the tile yet did not do tests or felt the tests were necessary. At issue is really the fact that there was an assumption that the RCC was tougher than tiles. It actually is more brittle but then a lot of things are brittle when hit with a large chunk of something at 1000 km/hr.
  •  The speed of the foam was also a known known to the modelers. It could be computed right after the launch and was within the range listed in the database mentioned above.
  • A known unknown to the modeler was the impact effect of a very large piece of foam on the leading edge of the wing. This is reflected in the size fragments used in the database. There was simply no data.
  • An unknown unknown to the manager and the rest of the world was the eventual demise of the whole spacecraft fifteen days later due to this impact. To the modeler, I believe the demise of the spacecraft was rather a unknown known.
Unknown unknowns are clearly outside of most modeling for a multitude of reasons. Robust mathematical modeling ought to provide some warning about known unknowns and most importantly provide a framework for not allowing unknown knowns to go unnoticed by either the engineers or their management.

Wednesday, November 10, 2010

RMM Example #2: Spacecraft Thermal Management

When designing Spacecrafts, one of the major issue aside for designing its primary instruments is to devise its thermal management (i,e, managing the way power produced by the spacecraft can be removed so that it does not overheat). The thermal management of Spacecrafts requires solving different sets of issues with regards to modeling. Because spacecrafts generally live in low earth or geostationary orbit, the only way to remove power generated on the spacecraft is through radiation out  of its radiators. This radiator point is the lowest temperature the spacecraft will experience. If the spacecraft is well conditioned all other parts of the spacecraft will have higher temperature no matter what. The main issue of thermal modeling for spacecraft design is really making sure that all the other points of the spacecraft will be within the temperature bounds they are designed for: i.e. The thermal rating for a DC/DC converter is widely different than that of a simple CMOS or the lens of a camera. Hence computing the radiator temperature is of paramount importance and can be done very quickly with a one node analysis. Yes, you read this right, at the beginning, there is no need for Finite Element computations in spacecraft analysis except maybe for very specific components and very specific conditions. The most important computation is figuring out this one spacecraft-node analysis. In terms of modeling, it doesn't get any simpler and it is robust. The issues that tend to crop up are when one gets into the detailed power consumption and thermal energy flow within the spacecraft as more detailed constraints. are added To summarize the issues, let me try to follow the list of issues that is making up the definition of problems needing Robust Mathematical Modeling as a guideline:

1. The laws describing the phenomena are not completely known ;

In fact, in this case the laws are known but there are large uncertainties at many different levels:
  • each element of the spacecraft has a thermal conductance, but since one is dealing with heterogeneous elements like a CMOS or a slab of aluminum, the designer is constrained into a lumped analysis involving a delicate weighting.
  • the thermal contact resistances / conductances of the electronics are generally unknowns in terms of performance especially in vacuum. Most information on the electronics is given when convection is available (for ground use). Even when environment is known, electronics components are very hard to evaluate. See this very interesting thread on LinkedIn.
  • the thermal contact conductance of two pieces of metals connected to each other through nuts and bolts is by no means a trivial subject. The contact conductance certainly depends on  how much torque was put on the washer/nuts/bolts and the level of vacuum.
  • the space environment produces different heating and cooling conditions that are inherently different based on the positioning of the spacecraft, its orbit, etc...
  • in order to regulate temperature efficiently, cloth and paints are covering the spacecraft for the duration of its life. There are uncertainties with regards to how these decay over time and most computations include Beginning Of Life (BOL) and End Of Life (EOL) estimates.
  • An element of confusion is a mathematical one too. Since most of the thermal power is managed through conduction, radiation transport (a nonlinear term in T^4) is generally modeled as a linear node. When temperature gets too high, the conductance node varies with temperature to follow the nonlinear T^4 term.

2. The data are missing or corrupted ;

Spacecraft are generally designed with a clear emphasis on reducing its weight at the subsystem or bus level. A GEO satellite maker would rather put one more transponder bringing revenue on its spacecraft than add additional instrumentation to provide data to the ground. Experimental data is rare in spacecraft design because real conditions are rarely fully instrumented. Tests are performed at every iteration of the spacecraft design though, but they are not total reproduction of the actual thermal environment sustained by the future spacecraft. For instance, Sun lamps only produce some subset of the wavelengths given by the Sun, so it difficult to find the thermo-optical properties of some paints or the efficiency of some solar cells. While vacuum tests get rid of the convection issue, it can do little to evaluate the performance of systems that rely on convection inside said systems such as loop-heat pipes.

3. The objectives are multiple and contradictory.

Four words: Hot Case, Cold Case. The worst thermal environments are generally sought in order to provide acceptable bounds during the lifetime of the spacecraft. It is no small secret to say that these two objectives are contradictory. One of the two cases, the cold case generally, also generate additional mass to remedy to it. Adding mass for a subsystem is not considered optimal as every kilogram in Low Earth Orbit cost about $10,000. The number is obviously higher for Geostationary Orbit. Another surprising element is that sometimes, the colder case is not an obvious one so the solver really has to go through many different types of iterations to define what that case it.
The objective to have the lightest spacecraft possible also flies in the face of thermal "equilibiuim". the less thermal mass a spacecraft has, the less capable it will be able to handle environmental swings. Cubesats, for instance, fall in this extreme category of spacecraft for which thermal fluctuations can be very large (and bring about a possible thermal "event:" such electronics board cracking, etc....)

As the design progresses the modeling is iterated upon testing (from components to whole spacecrafts) but as Isidoro Martinez points out, this really is just the beginning of a long processes fitting models with the result of the few experimentl data gathered through the lengthy Thermal Balance and Themal Vacuum Tests (TB/TV tests: From here:

Spacecraft thermal testing

Measurement is the ultimate validation of real behaviour of a physical system. But tests are expensive, not only on the financial budget but on time demanded and other precious resources as qualified personnel. As a trade-off, mathematical models are developed to provide multi-parametric behaviour, with the hope that, if a few predictions are checked against physical tests, the model is validated to be reliable to predict the many other situations not actually tested.

Final testing of a large spacecraft for acceptance is at the present limit of technology, since very large vacuum chambers, with a powerful collimated solar-like beam, and walls kept at cryogenic temperatures, must be provided (and the spacecraft able to be deployed, and rotated in all directions, while measuring). 

Typical temperature discrepancy between the most advanced numerical simulation and the most expensive experimental tests may be some 2 K for most delicate components in integrated spacecraft (much lower when components are separately tested).

Picture of the large vacuum chamber at NASA JSC where the Apollo LEM was tested.

An item of considerable interest to the thermal designer is to reduce substantially the time it takes to fit the few experimental results to the models. This part is by no means trivial given the nonlinearities of radiation heat transfer. The time required to fit models to the experiments is critical as tests are expensive and lengthy. From here the timeline for the TB/TV test of the Rosetta spacecraft that took thirty days:

Additional information can be found in:

Wednesday, November 3, 2010

RMM Example #1: Ground Motion Models for Probabilistic Seismic Hazard Analysis

Following up on my bleg, Nicolas Kuehn kindly responded to my query with the following e-mail:

Dear Igor,

I am writing to you because I saw your blogpost with a call for help/bleg. I am not sure if this is what you had in mind, but I might have something to contribute.

I am a seismologist working on ground motion models for probabilistic seismic hazard analysis (PSHA). In PSHA, we try to estimate the exceedance rate of a destructive ground motion level for a particular site (e.g. a nuclear power plant). It basically comes down to the following:
  1. What is the probability that an earthquake with magnitude M occurs in a distance R from the site in a particular time period.
  2. Given M and R, what is the probability that a certain ground motion level will be exceeded.
  3. Integrate over all magnitude and distance combinations.

I am working on part 2. Here, we want to estimate the conditional probability of the ground motion parameter Y given magnitude and distance. Y can be something like peak ground acceleration. Usually, a lognormal distribution is assumed, and we get something like this:

log Y = f(M,R)+epsilon

The parameters of f are estimated from large strong motion datasets, which consist of recordings of ground motion from large earthquakes at seismic stations.
Now comes the part where you are probably interested in. The estimation of f is not easy. There is physical knowledge about the relationships between Y and M and R, but there are still unexplained effects. For example, physical models make the assumption that an earthquake is a point source, but large earthquakes can have fault dimensions of up 10s or 100s of kilometers. This has to be taken into account. There are also other variables (site effects, rupture directivity effects and others) which influence ground motion. Not for all of them exist physical knowledge about the exact relation.
Another problem is missing data. Site effects are usually quantified by the shear wave profile under the station, but this is not always available. Similar, there is missing data for other variables as well.

There is also the problem that the data is not independent. One normally has many recordings from one earthquake at different stations, which are not independent. Similarly, you can have on station recording several earthquakes.

As I said, I am not sure if this is what you had in mind as problems, but if you are interested, I can provide you with more details....
For illustration purpose here a graph showing the seismicity of Germany ( a fact I was largely unaware of)

 back to Nicolas' problem which is not centered on just Germany,  I then responded with the following:

Thanks for the e-mail. I like your problem. ... I am curious on how: - the lognormality of Y was assumed
  • is there a predetermined shape for f
  • how was this shape found how ?  
  • how do you account for missing data ?

Any other detail is also interesting as well.
Nicolas then responded with:

Hi Igor,

I'm glad that you like the problem. I am also interested to hear about problems from other fields, and how these fields cope. ... Below, I have provided some more details.

- Lognormality: There are two reasons why this is assumed. One is data. I have attached a normal probability plot of ground-motion data from a strong motion dataset (about 3000 datapoints). The plot is taken from some lecture, and you can see that the points follow a straight line, but there are some deviations in the tails.
The second one is more theoretical: The Fourier spectrum of seismic ground motion can be written as F(f) = E(f)xP(f)xS(f), where f is frequency and E(f) is the source component, P(f) is the path and S(f) is the site part (so F(f) is a convolution of these parts). This is a multiplication of random variables, which leads to a lognormal distribution.
There is one problem with the assumption of lognormality which is widely recognized, but no satisfying solution has been proposed: It assigns a nonzero probability for very high, physically impossible ground motions (the tails). This becomes especially important for very low exceedance rates, where these tails become important. Critical facilities such as nuclear power plants or waste deposits must be designed to withstand ground motions with very low exceedance rates.

* shape for f: This is also based on the Fourier spectrum. The source part is proportional to e^M, the path part is proportional to 1/R e^-R, so the most simple model becomes:
f(M,R) = a +bM-cLog(R)-dR
This is based on a model that treats an earthquake as a point. For extended faults, where each part of the fault can emit seismic waves, there is interference and so on. Two observations can be made:
1. a magnitude saturation, where the larger the magnitude, the less the difference in ground motion. This is modeled usually either by a m^2 term in f or by a piecewise linear function for the magnitude.
2. There is an interaction term between M and R (the larger the magnitude, the less the decrease of Y with distance). This is modeled either as (c+c1M)Log(R) or by cLog(R+de^(gM)).

Site effects are usually modeled as discrete variables (the near surface underground is classified as ROCK, STIFF SOIL, SOFT SOIL), each with an individual coefficient. There are different ways how people classify the site conditions, though. In newer studies, one finds also the use of Vs30, which is the average shear wave velocity in the upper 30m, as a predictor variable.

Then, there is the style-of-faulting, which measures how the fault ruptures (horizontally or vertically). It is a discrete, three valued variable.

This leads to this form, which forms the basis of most ground-motion models:
f = a_1+a_2M+(a_3+a_4M) Log (R)+a_5R+a_6 SS+a_7 SA+a_8 FN+a_9FR,
where SS is 1 if the site class is STIFF SOIL and 0 otherwise, SA is one of the site class is SOFT SOIL, and FN is 1 if the style-of-faulting is normal, FR is 1 if the style-of-faulting is reverse.

Newer models take into account more variables and effects (fault dip, nonlinear site amplification, whether the earthquake ruptures the surface or not).

* missing data: This is treated differently
Sometimes, it can be possible to estimate one variable from a related one. E.g., there exist different magnitude measures (moment magnitude, local magnitude, body wave magnitude), on there exist conversion rules between them. There exist also different distance measures, which can be converted.
If a station has no shear wave velocity profile, one can look at the geology. This all introduces uncertainty, though.

What is also sometimes done is first determining the coefficients of magnitude and distance (for which information is usually complete), and later determine the remaining coefficients using the data that is available.

I have tried to determine the coefficients of a model using Bayesian inference via OpenBUGS, where I treated the missing data as parameters for which a posterior distribution was determined.


I am leaving the problematic as is for the time being. In many respects, the type of issues mentioned in Nicolas' problem are very reminiscent of a whole slew of problems found in many different science and engineering fields and subfields. One more thing, Nicolas also has a presentation and a poster that talks about some part of his implementation in A Bayesian Ground Motion Model for Estimating the Covariance Structure of Ground Motion Intensity Parameters and A hierarchical Global Ground Motion Model to Take Into Account Regional Differences.
Here is a video of the presentation:

This is a very nice example where methodologies for Robust Mathematical Modeling are indeed needed, we can see here that
  • The data are missing or corrupted ;
  • The laws describing the phenomena are not completely known ;

Thanks Nicolas !