Artificial neural nets

Artificial neural nets are an obvious analogy to Indra’s Net, they have nodes, they operate in parallel, and how they operate is hard to understand.

This code is for a C++ implementation of an Adaptive Resonance Theory 1 (ART1) model. It’s based on Carpenter, et al [1] and uses hints from Freeman [2].

Please link to the source code:

Source code .cpp

Source code .h

This page contains an overview of ART1, implementation notes, and directions for using the Windows application. The goal is to have the code be reusable and allow the application to be played with.

ART1 was chosen as a C++ exercise because it’s a non-trivial Artificial Neural Net (ANN) that has not been explored as much as feed forward back-propagation. Some interpretation of chapter 6 of [1] was necessary. Because it’s C++ software and not massively parallel, the sequentialization of the original is an interesting problem.

A separate section at the end explains the input to the art1.exe program and how to use it.

Overview of ART1

ART1 is an Artificial Neural Net (ANN) unsupervised classifier. It operates without intervention (any outside training) to find critical feature patterns; it classifies its input. It is designed as a massively parallel model, but is sequentialized in this implementation.

ART1 recognizes novelty by creating new feature patterns and is self scaling – it can recognize patterns as similar if there are proportionally a few bits difference. Given a set of binary input patterns, the result is a list of critical feature patterns.

The Long Term Memory (LTM) holds the results of classification. As is common in ANN, the memory is stored as weights, floating point arrays that read out usable values and/or are dotted with input vectors.

The top down LTM, holds the actual critical feature patterns. The bottom up LTM holds weights of how input patterns map onto learned features.

In our sequentialized model, the F1 and F2 layers manipulate arrays to represent fields of massively parallel nodes. This is the Short Term Memory (STM).

The STM in F1 contains the current input trace, I, and a comparison register, X. The latest guess as to a match on the input is set by F2 in the register V. I and V are compared in X and the orienting subsystem decides whether the match is close enough. If the match isn’t close enough, the orienting subsystem calls F2 to reset the last guess and try another. A try from F2 that is close enough to avoid a reset is called a resonance.

The STM in F2 contains the current mapping, using LTM, from F1 to the guesses in F2. This is the T register. The F2 layer has the its nodes compete to find the most active one. This is implemented by just finding the element of T with the largest magnitude.

In a typical run, each new trace is first processed by F1 without a V guess, from F2. This leads to a first reset by the orienting subsystem. It creates a first mapping to F2 in the register T, using the current LTM. T is used to try find a guess, which is sent to V, and F1 takes over in the loop again. The loop repeats until there is no reset, a resonance. After a resonance, LTM is updated, and the next new trace is given to F1.

Assumptions:

1) A simple state machine is used, which runs F1 and F2 at times which make sense.

2) Nodes interact in a sequential manner and registers are fully available at specific times.

3) Fast learning is used.

In order to sample the nodes of X with the highest activation, a contrast enhance algorithm is used from [3]. The algorithm of [1] clearly works; there is a common sense, visible division between the most active nodes of X and the others, but X still has to be sampled into S as {0, 1}.

First a histogram of X node values is constructed; the largest element of this gives the value of the highest activated nodes. The contrast enhance is then done using the histogram, to make it easier to pick out the activated nodes, see F1Layer.cpp.

When testing, it was noticed that the values in the bottom up LTM created higher values in the high slots of T. This skewed the competition away from used nodes. To fix this, a factor (BotupFact > 1) is used in assigning the winning LTM columns. See LTMBottomUp.cpp

As with [1] the winning F2 node is chosen by competition, in our case by choosing the largest value in T.

Running the Application

The input to art1.exe has a simple Lisp-like syntax.

<input> :=
‘(‘
<header section>
<trace section>
‘)’

<header section> :=

‘header (‘ <header values> ‘)’
<header values> := {0} | <header values> <symbol> <value>


<symbol> := // the following symbols have the same meaning as in [1].
‘N’ | ‘M’ // integer dimensions of data; N is the width of

// the trace slots and M is number of category

// slots.

// the following are the floating point parameters.

// only those marked are used at the present.
| ‘A1’ | ‘B1’ | ‘C1’ | ‘D1’ | // used.
‘Rho’ | ‘L’ // used.
| ‘Epsilon’ | ‘A2’ | ‘B2’ | ‘C2’ // not used yet.

// BotupFact as discussed above.
| ‘BotupFact’

// Debug listing level

// 0 => no listing

// 1 => some listing.

// 2 => maximum listing.

| ‘Debug’
<value> := <integer> | < float>

< trace section> := {0} | < trace section> ‘trace’
‘(‘ <trace list> ‘)’

<trace list> := // this is a list of N trace values.
{0} | <trace list> (‘0’ | ‘1’)

The sample file, trace_english.in, will make the above pseudo-BNF notation clear. Each trace of this file is to be viewed as a 5 x 5 representation of an English letter.

When the program terminates, the LTM is dumped. The critical feature patterns appear in the lower entries of the top down LTM.

References

[1] Grossberg, S. Ed. Neural Networks and Natural Intelligence,

Chapter 6 – A Massively Parallel Architecture for a Self-organizing Neural PatternRecognition Machine. MIT, 1988.

[2] Freeman, J.A., Simulating Neural Networks with Mathematica.

Addison-Wesley, 1994.

[3] Parker, J.R., Algorithms for Image Processing and Computer Vision.

Chapter 3, section 3.1. John Wiley and Sons, 1997.