Helicopulas

To make a helicopula you need a helicopter, a passing interest in Copula functions, and a bunch of algorithms fighting to predict the helicopter.

Participants in the SciML helicopter challenge are asked to predict the position of a helicopter in a two dimensional space. Inspired by this example, I published a bivariate helicopter data stream at Microprediction.Org (where algorithms fight to predict any data you choose to send). Consider it an experiment in separation of concerns in the difficult task of multi-dimensional distributional prediction.

Perhaps the helicopter prediction task might not be best achieved by a single algorithm or model. Perhaps different algorithms authored by different people using different languages might conspire to create a better forecast. Time will tell.

Outline:

Copulas and helicopters
Python walkthrough of the helicopula, including how you can submit your own helicopula distributional prediction to microprediction.org

On Copulas and helicopters

First the terminology. This is an example of a Copula (on the left)

A Copula, in this case a 2-copula, is a bivariate cumulative distribution function for a variable taking values on the unit square. In other words, any time you have a two dimensional variable whose one dimensional margins (i.e. the variable you get when you look at one coordinate only) are uniform, you have a two dimensional Copula. This particular example is Gaussian Copula whose corresponding density is on the right.

Now to helicopters. I think we know what a helicopter is. But here is a laboratory helicopter arranged so that there are only two degrees of freedom (up and down, and rotating around the vertical axis). The two angles are called pitch and yaw, denoted theta and psi respectively.

Here are measurements of its yaw angle psi changing through time.

You can find a similar time series for its pitch angle theta at Microprediction.Org under the stream name helicopter_theta. Microprediction.Org is a place where anyone can publish a live data stream and it will be attacked by prediction algorithms.

Helicopula

So then, what is a helicopula? Obviously it must be the Copula function that is implicitly defined when you ask dozens of machine learning algorithms to try to predict forward values of pitch and yaw.

To explain... we first need to know just a little about z-streams at Microprediction.Org. You can read all about the mechanics in this article if you wish, but briefly, a z-stream is plotted below where you see a transformed measure of yaw (psi) that is called a z-score. The definition of the z-score is not the usual one, and nor is it any formulaic transformation of psi. Instead, we use the fact that algorithms fighting at Microprediction.Org collectively generate a distribution of possible future values of psi. The collective cumulative distribution function is applied to the arriving data point psi, yielding a number between 0 and 1. Then, an inverse normal distribution function is applied resulting in what we call a z-score.

Yes, this terminology overloads the usual notion of a z-score but as we all know, traditional z-scores aren't the greatest. On the other hand (and assuming competition to predict psi is fierce) our definition of a z-score will yield a variable that is normally distributed - which is true of regular z-scores only if the data is normally distributed to begin with. Aside: one very good reason to publish live data at Microprediction.Org - which you can do any time - is to automatically receive "better" z-scores.

Next, the helicopula. We unwind that last inverse normal transformation to get a uniformly distributed transformation of psi instead of a normally distributed one (in other words, implied percentiles of each data point). We do the same for the pitch of the helicopter. This leaves us with one number between 0 and 1 for psi (yaw) and one number between 0 and 1 for theta (pitch). The joint bivariate distribution of these approximately uniformly distributed random variables ... well it could not possibly be called anything but a helicopula, could it?

Why care about helicopulas?

Let's break this down

Why care about helicopter dynamics?
Why care about helicopulas?

Naturally we are all capable of abstraction and generalization, so a genuine interest in helicopters isn't really a requirement.

For the first part of the question I refer you to the SciML helicopter challenge and observations from Chris Rackauckas:

We do not have data from every single detail about the helicopter. We know the electrical signals that are being sent to the rotories and we know the measurements of yaw and pitch angles, but there are many hidden variables that are not able to be measured.
While it is governed by physical first principles, these first principles do not describe the whole system.
Since our goal is to understand the helicopter system, simply training a neural network or performing reinforcement learning does not solve the problem: we wish to understand the actual physics instead of simply making predictions.

The helicopter provides an excellent example of an incompletely understood physical system (perhaps due to the cabling you see around the helicopter, as speculated in this video)

It seems to be particularly hard to predict whether the helicopter will rotate or not as a function of voltage applied. The yaw angle psi is unresponsive to voltage for a long time then eventually responds. Certainly there are some terms missing in the differential equations that might describe a frictionless setup. One attempt to fill these in leads to a good estimate of how much the helicopter tilts, but a poor estimate of which way it is facing (plot from the contest repo by Chris Rackauckas).

The problem is fascinating as it sits as the intersection of differential equation modeling and Machine Learning, which is the focus of the SciML group and the reason for development of some of powerful packages in the Julia language. I would encourage anyone interested to take a run at this challenge ahead of JuliaCon 2020 (July 29th to 31st). See links in the accompanying notebook.

Why care about copulas, and predictions of implied copulas?

Now to the second part of the answer. Are Copulas useful? Or more precisely, is the decomposition of a helicopter's future bivariate density into a combination of marginal distributions and a Copula useful? Some elementary observations:

The helicopula is somewhat detached from some vagaries of the marginal distributions of pitch and yaw. It is left unchanged by monotonic transformations of pitch and yaw. Possibly, therefore, one might understand some aspects of the modeling, or model misspecification, that is similarly invariant. For instance a differential equation model might consistently underestimate mass in the upper rightmost quadrant (say) of the Copula density. Furthermore, it may be possible ahead of time to determine which changes to the model will change the Copula in a material way and which will not.
Because Microprediction.Org provides separate contests to predict margins and the Copula, it may be possible to combine people's expertise in new ways. Statistical properties of the helicopula may suggest a model change that is obvious to someone and less obvious to the creator of the original model.

I will restrict attention to bivariate prediction in this post - though trivariate examples also exist at Microprediction.Org. Another bivariate example is given by the combination of Seattle wind speed and direction.

Python walk-through of the helicopula

Let's write some code to look at the helicopula.

Raw data for Julia contest

First though, here is a peek at the original helicopter time series, including both the pitch and yaw angles but also the drivers (how much force is applied by each of two motors)

You can reproduce this with

pd.read_csv(data_url).plot()

where the long data url is provided in the notebook (and of course pd is pandas).

Predictions of psi and theta

We will instead look at a shorter but growing time series at Microprediction.Org (direct link to stream here). Most of the data on this site is live, but this series is fake live data insofar as one data point from the original dataset is added every seven minutes. As you can see there are a few of them already predicting the series even though I just started publishing it. Perhaps by the time you read this there will be more.

Aside, Of course it is possible for someone to cheat (which would give me an excuse to write an article about obfuscation) but for now let's assume the algorithms are only using the limited history.

You can retrieve the data with

pip install microprediction

and then:

and similarly for the stream named helicopter_theta.json

Standardized univariate streams (z1~)

Similarly,

retrieves and plots the standardized z-scores (based on predictions quarantined for 70 seconds or more).

Bivariate streams (z2~)

Finally the helicopula stream. Notice that it is prefixed by z2~

However as noted this stream actually comprises scalar values. But each encodes a pair of points via a space filling curve. The MicroReader comes with a translation method from_zcurve and we can recover two dimensional points as follows:

Behold the helicopula

The algorithms have cottoned on to the fact that pitch and yaw are frequently unchanged. But I won't say too much since this is a very new series and by the time you look at the plot it may look quite different.

Transforming the helicopula

Let's convert the copula samples into bivariate samples with normal margins instead

thus giving rise to a view of the dependence viewed though a normal lens:

At time of writing there is roughly a negative 40% correlation between normalized pitch and yaw - though this would be a good time to remind the reader that not all bivariate random variables with gaussian margins are bivariate normal! Never mind that. Let's fit this spray of points with a Gaussian Copula as follows:

And then we can simulate from this copula:

and see if it is a good match to the original. Not bad as you can see.

Some people think use of the Normal Copula led to the destruction of the financial system in 2007 but don't let that stop you from using it.

Predicting the helicopula

Next we create a submission in the helicopula prediction contest. The contest expects univariate predictions, but our submission will compute these by first generating a distributional estimate in two dimensions then projecting to one. We are required to submit exactly 225 numbers. So...

Now convert these into single numbers suitable for submission. The projection is

because we start with normally distributed numbers, not uniform as to_zcurve expects. Then

is a vector of floats ready to go.

Submitting a helicopula distributional forecast

To submit a prediction of the helicopula we need a secret identity and a writer. First, the identity:

Don't lose the key. Instantiate the writer to discover your spirit animal

And then submit.

The horizon argument is set at 70 seconds here. Your choices are

corresponding to roughly 1 min, 5 min, 15 min and 1 hr ahead. Since data for the helicopter arrives once every 7 minutes (approximately) both the shorter horizons choices correspond to forecasts of the next data point ... assuming you make them just after the last has arrived. The longest horizon corresponds to looking ahead roughly 8 data points.

Determining if your prediction is helpful or not

The best thing now is probably to run over to Microprediction.Org and see where you are on the leaderboard for the stream with name z2~helicopter_psi~helicopter_theta~70.json

Better, click on dashboard and punch in your write key. You may see something like this:

However you can also retrieve status programmatically as follows:

Check balance

or:

and so forth (see the microprediction package on PyPI). However balance and performance won't change until a data point arrives to judge your submission, and your submission won't be eligible until it leaves quarantine (70 seconds from now) and the next data point has arrived. Thus it might be 8 minutes.

Good luck.

Endnote: A second helicopula

The reader might have noticed that an implied copula is defined by a choice of prediction horizon. There are in fact two live helicopula streams at Microprediction.Org named z2~helicopter_psi~helicopter~theta~70 and z2~helicopter_psi~helicopter~theta~3555 differing only in the quarantine time. Points in the second series express normalization of data with respect to a 1 hour ahead prediction, whereas the first, which we have paid attention to here, is based on predictions quarantined for only a minute or so.

Next time ... a Julia walkthrough

Rest assured that part of the reason for including the helicopter data is to nudge myself to write a Julia microprediction client as soon as possible, and hopefully well before JuliaCon. If you are a Julia fan and would like to help with that, please shout!

About me

Hi I'm the author of Microprediction: Building an Open AI Network published by MIT Press. I create open-source Python packages such as timemachines, precise and humpday for benchmarking, and I maintain a live prediction exchange at www.microprediction.org which you can participate in (see docs). I also develop portfolio techniques for Intech Investments unifying hierarchical and optimization perspectives.

Helicopulas

Peter Cotton

Helping to build a decentralized prediction network

Outline:

On Copulas and helicopters

Helicopula

Why care about helicopulas?

Why care about copulas, and predictions of implied copulas?

Python walk-through of the helicopula

Raw data for Julia contest

Predictions of psi and theta

Standardized univariate streams (z1~)

Bivariate streams (z2~)

Recommended by LinkedIn

Transforming the helicopula

Predicting the helicopula

Submitting a helicopula distributional forecast

Determining if your prediction is helpful or not

Endnote: A second helicopula

Next time ... a Julia walkthrough

Related

About me

More articles by this author

Insights from the community

Others also viewed

9 Key Trends to Watch in 2023: 3D, AEC and Geospatial Innovations

Replicator Updates from AUVSI: Jan 12, 2024

Navigating the Future of Drone-Based Detection: Post-Processing Versus Real-Time Insight Solutions

Weekly Robotics 196

Geopolitics and Artificial Intelligence

Ocean AI #34

What's in sight? The ImageVision.ai's Monthly Newsletter

Hidden Champions in Austria’s Geospatial Ecosystem: Opportunities and Challenges

Understanding Object Detection: An Overview of the Top Algorithms

How to obtain better results when capturing elements on geospatial images?

Explore topics

Outline:

On Copulas and helicopters

Helicopula

Why care about helicopulas?

Why care about copulas, and predictions of implied copulas?

Python walk-through of the helicopula

Raw data for Julia contest

Predictions of psi and theta

Standardized univariate streams (z1~)

Bivariate streams (z2~)

Recommended by LinkedIn

Transforming the helicopula

Predicting the helicopula

Submitting a helicopula distributional forecast

Determining if your prediction is helpful or not

Endnote: A second helicopula

Next time ... a Julia walkthrough

Related

About me

Shutting Down California — The Billion Dollar Prediction Problem

May 1, 2023

Nine Yards is Enough - Why NFL Receivers and Running Backs Should Stop Shy of the First Down

Oct 26, 2020

Comparing Python Global Optimization Packages

Oct 19, 2020

The Instant, Morbid Reaction to the "Worst Debate in History"

Oct 1, 2020

Be the World's Most Asymptotically Productive Data Scientist (Deploying Models Edition)

Aug 10, 2020

Live, Online Distribution Estimation Using t-Digests

Jul 29, 2020

Benchmarking AutoML Vendors and Open Source Time Series Packages

Jul 22, 2020

On Masks and Seat Belts. COVID Cases Mount Higher where Ralph Nader was Unpopular, and Conversely.

Jul 16, 2020

Ever Lost an Algorithm? A Suggestion for Addressing the Reusability Crisis

Jul 14, 2020

Where will a Badminton Player Move to Next, and How Should we Adjudicate Predictions of the Same?

Jul 13, 2020

Insights from the community

Others also viewed

9 Key Trends to Watch in 2023: 3D, AEC and Geospatial Innovations

Replicator Updates from AUVSI: Jan 12, 2024

Navigating the Future of Drone-Based Detection: Post-Processing Versus Real-Time Insight Solutions

Weekly Robotics 196

Geopolitics and Artificial Intelligence

Ocean AI #34

What's in sight? The ImageVision.ai's Monthly Newsletter

Hidden Champions in Austria’s Geospatial Ecosystem: Opportunities and Challenges

Understanding Object Detection: An Overview of the Top Algorithms

How to obtain better results when capturing elements on geospatial images?

Explore topics