Timing Solution: Back Testing Report
Price Bar Patterns (Japanese Candlesticks) model for Dow Jones Industrial Index
Executive Summary
Core Idea
Numeric Experiment Description
Artifacts Study
Parameters to Vary
The Back Testing definitely shows that the forecasting model based on proportions of price bar elements (Open, High, Low, Close) allows to predict the future price movement. We call this model Japanese Candle stick model. This fact is in contradiction to Random Walk Theory that states "... past movement or direction of the price of a stock cannot be used to predict the future movement."
This fact is proven statistically with the probability 98%.
The "forecasting horizon" of this model is 7 price bars ahead. In other words, we can get the projection line for 7 trade days ahead.
The projection line based on this model provides the correlation 0.11 to the real price movement. In other words, this model allows to describe at least 11% of price movements (we use 0%-100% scale).
We have tried to exclude all possible artifacts (such as seasonality, for example). Any information leaks are totally excluded (usually, "information leaks" occur when checking the model's performance for the same data set that has been used to create this model).
The statistical report is here. Also, you can see the the forecasting curves produced by this model. This is the part of the full report that contains about 600 pictures and takes 10 Mbytes. The forecasting curves are picked up randomly. The full report is available for everybody, at their first demand (contact e-mail: tarassov@rogers.com).
All calculations are made for Dow Jones Industrial index (the data set covers time period from 1970 to 2005).
This is the summary table with the information regarding the best models:
| Model | Training Interval | Prediction Horizon |
| Neural Net Model | 5.000 price bars (20 years) | 7 price bars |
| Neural Net Model
Linear Model |
1.000 price bars (4 years) | 5 price bars |
Mathematically,
this approach is very close to Fuzzy Neural Network technique. The strongest side of
this method is: it allows to reveal the connection between several last price bars
and future price movements. These connections are revealed during so called training
procedure. The program compares the structure of these price bars (in this example, we
use 15 trade days - regression order) and future price movements and creates a
model. During the training procedure, the program adjusts parameters of the Neural Net
(its neurons) to get
the best fitness between the model's projection and the real price.
In other words, if there is any relation between the price chart and
future price movement, this Neural Net allows to find it. You can accept this
Neural Net as a very ambitious expert who analyzes a lot of price charts as well
as all their possible combinations trying to see the future. It is exactly what
any professional does in any field: first he/she spends time and energy learning
the subject, then he/she applies the knowledge and gains the experience; only
after that he/she becomes a professional - a person who can make an executive
decision regarding the ongoing process (and its future!) based on the
knowledge and expertise.
How does this core idea work in Timing
Solution program? To define any price bar, we use the special language -
Universal Language of Events (ULE).
To describe the price bar, we need to consider these 3 parameters:

1) the difference between High and Low (True range);
2) the difference between Close and Open (so called "real body");
3) the difference between Open and Low (so called "lower shadow").
All these values are normalized, to exclude the trend effect. To describe these proportions, we use Fuzzy Logic math. In other words, the program deals with events like this: "five days before the real body was big while the lowed shadow was very high". We work not with digits, but with so called membership functions (i.e., "real body was big").
For example, let us look at the figure called "Shaven bottom":

Here is how it sounds on the language of Fuzzy Logic: "the real body is negative and medium while the lower shadow is zero". It is the event to be considered. This event and other similar events can be described in the Timing Solution program.
Now the time comes for the Object Oriented Neural Net (which is "know
how" of the Timing Solution group).
Like any regular Neural Net, this system is looking for any possible connections between
the price
configuration in the past and the future price movement. But, apart of that, we provide
a very powerful mechanism of sub-optimization. In general, it means that during the training
procedure the program itself makes a conclusion regarding to what should be
taken as "wide body" or
"medium lower shadow". This procedure allows to avoid a very unpleasant
effect of many Neural Nets - its over training.
Numeric Experiment Description
These are steps of Back Testing procedure:
Data: this example is made for Dow Jones Index daily data for the years 1970 - 2005.
Choosing training/testing intervals: in this data set, we choose some point that divides the price data on two parts: a training interval used for training the Neural Net and immediately following it a testing interval used to estimate the model's performance. To exclude "memory leaks", these two intervals do not contain any mutual points. The border between the training and testing intervals is called Learning Border Cursor (LBC).
Training the Neural Net: we use the Back Propagation procedure to train Neural Net (at the same time, we have created the simple linear model based on the same events; its purpose is explained below). As a target function, we use a detrended price oscillator to exclude the trend and see the short swing movements. The oscillator is calculated as (Close-MA(Close,Period=5))/MA(Close,Period=5), where MA is the exponential moving average of Close index with a period of 5 price bars. The Neural Net has 32 hidden units, it is trained on 25.000 steps.
Varying Training/Testing Intervals: to train the Neural Net, we use different training intervals - 1000 price bars (4 years), 2500 bars (10 years) and 5000 bars (20 years). For the estimation , we have calculated the correlation coefficient between the price oscillator and the projection line generated by the Neural Net, like this:
For testing we used the price bars only that did not use in Neural Net training, thus we avoid any kind of future leaks. The forecasting horizon is restricted by 7 bars ahead, because the autoregression order is 15.
Randomizing Neural Net: we set the Neural Net at some initial position, setting the weights to a small random value. Thus we prepare the Neural Net for the next piece of data set.
Repeating steps 2,3,4,5 for another LBC: we choose another piece of data set. It is possible to do this in two ways: shifting the LBC forward for a few price bars or setting it in a random way. We use the second algorithm because it allows to exclude some artifacts (see Artifacts Study).
Sample Size = 200: we repeat this procedure 200 times (200 LBC x 3 training interval x 3 testing intervals).
Null Hypothesis: we state that if this system does not produce the real forecast, the average correlation between the projection line and the oscillator should be equal to zero (see Artifacts Study for more details), but the numeric experiment rejects this statement.
Statistical Results: The numeric experiment shows that the best projection line is produced by the Neural Net that uses 5000 price bars (20 years of price history) as a training interval. This model provides the forecast for 7 price bars ahead. The average correlation is 0.1125. The positive correlation has occurred 124 times while negative one has happened 76 times. The control group (null hypothesis) should give these results: 100 times for a positive correlation and 100 times for a negative one. The Pearson chi square criterion states that, with the probability of 98%, this result is not accidental.
Detailed Analysis: For more close forecast (5 price bars ahead), it is possible to use only 1000 price bars (4 years of price history) to train the Neural Net. In this case, the average correlation is 0.1019 (121 times the correlation is positive while 79 times it is negative). You can also use the simple linear model for this configuration (1000 price bar training interval/5 price bar forecast). The linear model provides the average correlation 0.060 (113 times for the positive correlation and 87 times for the negative one). But it looks like the Neural Net provides the better forecast. It means that nonlinear effects are important here. Practically, it means that for the future price movement the combination of price bars provides more effect than the price bar itself. To predict the future price, the combinations like "morning star formation" are more powerful than any single bar.
There are three artifacts to be considered:
1) Seasonal Cycles: this effect is excluded because we analyze the short terms price movements (7-10 days). As a target function, we use the very short term oscillator with the period of 5 days. To exclude any questions regarding seasonality, we provide the additional Back Testing for the Spectrum model. In other words, we have calculated the spectrum using %x price bars before LBC, extracted the most strong cycles and created the projection line based on these cycles. The average correlation is 0.051. This is all that cycle based models can provide regarding 7 days ahead forecast. Compare it to the Japanese Candlestick model that provides the average correlation 0.112. It means that with probability at least 95% the Japanese Candlestick model reflects more than any fixed cycle model (t-statistic, t=1.8). The statistical report for spectrum based model is here.
2) Normalization of the LBC: one more source of artifact lies in the algorithm of choosing the Learning Border Cursor (LBS). Sometimes the future of stock market movement is obvious without any sophisticated math. For example, if the price goes today up 4% (due to some fundamental factor), the next few days the stock market will be not so active. See the efficiency test calculated for DJI (1970-2004 data set) for the event "the price goes up at least 4%":

In other words, this LBC itself contains some information regarding the future price movement. To exclude this effect, we provide the special algorithm of choosing the LBC (it is called "a normalization of the Learning Border Cursor"). But the numerical experiments show that we can neglect this effect as both algorithms (with and without the normalization of LBC) provide very close results.
3) Why we apply the Pearson's correlation as a measure of fitness between the price and
the forecast: Generally speaking, the market movement is not normally distributed. But
in this particular task we analyze the short term price movement. As an inputs for Neural Net, we use the short term oscillator with the period=5 price bars. It
is represented by a red curve:
This oscillator: a) reflects the short term price movements; b) avoids the unpleasant effects related to the non normal distribution.
Here is the histogram for this oscillator:
Right now, we recommend these parameters to vary:
The auto regression order:

Practically it means the time of impact of any particular price bar.
Another parameter to vary is the number of hidden neurons:

Practically, the bigger number of hidden neurons means the possibility to reveal the complex figures.
February 24, 2005
Toronto, Canada
© Timing Solution