Data Dictionary

The official Scientific ML Data Dictionary.

How to use OnDemand

Scientific ML does the math/science for any historical time-series (with constant cadence) when the data is finite and discrete (always is).  Scientific ML produces a set of scientific measurements that are intuitively familiar (energy, power, resistance, temperature, etc.) and that can be seen to drive value dynamics.  Often the value dynamics (e.g., price) is seen to be far from equilibrium.

To invoke Scientific ML, two files are needed.

  • Datafile. Customer time-series formatted as a CSV file.
  • Configuration file. Provides additional information on how to process the customer’s time-series.  The configuration file upload triggers Precision Alpha processing.

The two files should be uploaded separately, time-series first, configuration second to a Precision Alpha owned AWS S3 sub-bucket specific to the customer, created during customer on-boarding.  Precision Alpha leverages AWS security infrastructure.  Security policy ensures that no other customer will have access to your processing area.

The amount of historical data to learn from is an important consideration.  More is not necessarily better.  At least 125 timestamps in the datafile ensures fractional fluctuations in the third decimal place.  More timestamps in the datafile, however, tends to reduce the responsiveness to changes in the market.  We recommend that the time-stamps be sampled to produce a time-series between 125 and 1250 timestamps.  The timestamps must have a customer specified cadence (second, hour, day, week, and so on).  The format for the timestamps is also under customer control, however, any customer timestamps conventions that use characters forbidden in filenames will throw an error.

Additional time-series with the same time-stamps can be added to the input columns of the time-series file.  Precision Alpha will process all of them and return them in the output file.   Up to 25 time-series are permitted in each run.

Precision Alpha OnDemand

Precision Alpha OnDemand is a “multimeter” that produces exact market measurements from time-series.  Any time-series can be processed (e.g., prices, trading volumes, volatility, sales volumes, inventory, and so on).  OnDemand is mathematically valid in all markets, both equilibrium and non-equilibrium.

We illustrate the use of OnDemand with an application: Precision Alpha Exchange (PAEx).  Less than five hours after market close, Exchange scientifically processes every stock on the NYSE and NASD based on six months of closing prices to produce a data file of scientific measurements: next day probabilities that a stock will go up (down), market energy, market power, market resistance, market noise, market temperature, and market free energy (Helmholtz).  The free energy is the energy available to do “value movement work” (‘value’ depends on the specific time-series).

The data schema and I/O format are presented in the table below.  The OnDemand web service input data used for Exchange consists of the first two columns (time-stamp and value) in the table below without the headers.  The data is put into a simple comma delimited form for direct processing.   The OnDemand web service generates output in the format and order in the table below without the headers.

A sample configuration file is reproduced below.  It identifies the output sub-bucket specific to the customer account.  The column headers are identifiers for each time-series.  The column headers do not need to be meaningful (e.g., ColA, ColB, etc.), however the number of column headers is necessary for correct processing.  In the configuration file below, there are two time-series to process: NetValue, NumTx.  And finally, the reservoir temperature for the environment can be specified, T_R.  The default value for T_R is the temperature of statistical equilibrium in our units (e/4).

Precision Alpha Measure consists of two products: Exchange and OnDemand.  Exchange processes 6 months time-series of closing prices for an entire Exchange (e.g., NYSE, NASD), the data paid and managed by Precision Alpha.  Exchange assumes a trading horizon of about six weeks and is based on 6 months of closing prices.  Traders are frequently interested in other asset classes, horizons and time-series values (other than price).

OnDemand processes any time-series acquired and managed by our customers to enable innovative and flexible market machine learning on customer-based time-series.  OnDemand uses the same scientific processing engine as Exchange and the same I/O, but OnDemand processes any time-series provided by the customer that satisfy modest minimum data quality requirements.

Data Definitions

The data dictionary definitions for Scientific ML output are found below.  All values are float with four decimal places of accuracy.

Data NameDefinitionMore Information
PplusComputed, machine learned probability that the value of the measurement will go up in the next time step. The probability is computed exactly (up to double precision numerical error).The probability is a function of the system energy and Lagrange multiplier.
PminusComputed, machine learned probability that the value of the measurement will not go up in the next time step. The probability is computed exactly (up to double precision numerical error).The probability is a function of the system energy and Lagrange multiplier.
Energy or EmotionsComputed, machine learned system Energy measured from the equilibrium energy as a zero offset. Positive: Bull, Negative: Bear.

When E=0, the data is in statistical mechanical equilibrium and is objective. When human will is involved, deviations from E=0, is logically "not objective", that is, subjective, or, aesthetic, and as we see, assigns a measured value to pleasure. Note that emotions is plural, a rollup of all emotion.
Scientists often design experiments with constant energy or temperature; these systems are said to be in equilibrium. Equilibrium implies that the probability that an asset price will go up is equal to the probability that it won't go up (an "unbiased coin").

Markets, however, are driven by emotions through the actions of market participants, and are not in equilibrium (almost all of the time) and do not have constant energy. The energy/emotions is a measure of how far away the data is from statistical equilibrium.
PowerPower is the rate of energy flow per time step. Market power combines Emotion and Resistance. At equilibrium, Power is equal to Emotion squared divided by R, that is, V^2/R).Energy is used to do work, or, "to move the needle". Power is the energy flow per time step and is generally not constant. This measurement calculates the power to perform work.
ResistanceMarket resistance to changing price.Wherever there is energy available "to move the needle", there is also resistance to moving the needle. The more resistance, the harder it is "to move the needle". This measurement calculates the resistive force. The resistance is not constant in systems that are not in equilibrium.
NoiseComputed, machine learned market (Nyquist) noise that dissipates system Energy so that it cannot be used for price movement.Power can also be wasted (dissipation through strain or viscosity) so that it is unavailable to do the work that we want, namely, "to move the needle". As the noise increases, the amount of wasted power increases.
TemperatureThe entropic temperature of the system.

In non-equilibrium dynamics, the free energy and temperature are coupled to produce a heat engine. By observing the behavior of free energy and temperature, price entry and exit points can be identified. See Free Energy for more detail.
The temperature is the reciprocal of the derivative of the entropy with respect to energy---this is the general definition of entropic temperature.

Recommend plotting free energy and temperature as a double-sided plot.
Free EnergyHelmholtz free energy.

Can be used with temperature to identify favorable environments for price movement, and entry and exit points. See information.
Free Energy. Tells us when the total energy / emotion is available to do useful work (F = E - TS, Helmholtz). Minimum Free Energy is a more convenient form of maximum entropy in non-equilibrium problems. When free energy decreases, it does work in the dominant emotion (bull or bear). In non-equilibrium, the free energy and temperature can play off of each other to generate a heat engine that drives price movements.

Local maxima in free energy indicate when energy is available for price movement work (an entry point). As the free energy decreases from the local maximum, price movement in the direction of the dominant emotion can be observed over an extended time. After the price movement, a local minima will develop, equivalent to maximum entropy. The stable minima signifies an exit point for the dominant emotion trade.
ThermPplusComputed, machine learned probability that the value of the measurement will go up in the next time step when in a thermal bath of temperature T_R.

Where thermal probabilities dominant over Pplus and Pminus (above), thermal probabilities drive price movement.

The thermal probabilities depend on the temperature difference between the system temperature (above) and the reservoir temperature (T_R). Decision Machine uses a default value for T_R at statistical equilibrium (T_R = e/4), but can (and should) be set by the customer in the configuration file. Risk analysis and forecasting will both require that the customer measure T_R specifically for their data.
A dissipative system is a thermodynamically open system which is operating out of, and often far from, statistical or thermal equilibrium in an environment with which it exchanges energy and information.

The flow of energy into and out of a dissipative system generates thermal probabilities that are calculated from first principles of modern thermodynamics.

The output generated by DM for a dissipative system permits a new measurement: the temperature of the resevoir (T_R). In a dissipative system, when the Temperature and Free Energy (above) are constant, T_R can be directly measured. Easier to see when plotted.

ThermPminusComputed, machine learned probability that the value of the measurement will not go up in the next time step when in a thermal bath of temperature T_R.

Where thermal probabilities dominant over Pplus and Pminus (above), thermal probabilities drive price movement.

The thermal probabilities depend on the temperature difference between the system temperature (above) and the reservoir temperature (T_R). Decision Machine uses a default value for T_R at statistical equilibrium (T_R = e/4), but can (and should) be set by the customer in the configuration file. Risk analysis and forecasting will both require that the customer measure T_R specifically for their data.
A dissipative system is a thermodynamically open system which is operating out of, and often far from, statistical or thermal equilibrium in an environment with which it exchanges energy and information.

The flow of energy into and out of a dissipative system generates thermal probabilities that are calculated from first principles of modern thermodynamics.

The output generated by DM for a dissipative system permits a new measurement: the temperature of the resevoir (T_R). In a dissipative system, when the Temperature and Free Energy (above) are constant, T_R can be directly measured. Easier to see when plotted.