Minicolumns: The Thorpe Task

Up: Minicolumns: Path Selection and Planning

Contents:

Next: Minicolumns: The Schultz Task

Task Protocol

Two distinct stimuli (Catacomb currently makes distinct tones available).
action\statestimulus 1stimulus 2
goreward (juice)aversive (saline)
no-gono rewardno reward
The association of a particular stimulus with the reward is reversed frequently.

Minicolumns required for this task as the following:

Modifications of the Task

Some modifications are made so that simulations can be performed in a manageable computation time.

Three randomized presentations of each stimulus are given during training. Then follows a set of four test stimuli. Subsequently, the meaning of the stimuli is reversed on four more test stimuli.

During the training presentations, randomized go/no-go actions are simulated. During testing, go/no-go actions are controlled by the neuronal simulation. The randomized go/no-go actions may also occur when no visual stimuli are presented. During initial construction of the task simulation go/no-go actions always occur 10 ms after a new visual stimulus appears.

Model Construction

Simulating the environment of the Thorpe Task

The implementation of the minicolumns, as well as some of the other processing and control structures are included from minicolumn-layers-aWM.20030904.ccm. A new simulation of the environment, the experimental protocol and the feedback loop between the environment and the neuronal simulation is added as shown in Figure 1 (from minicolumns-thorpe-task.20031212.ccm).


Figure 1: The environment simulation for the Thorpe task.

The spike trains produced for stimulus 1, stimulus 2 and reward states, as well as for go/no-go actions appear correct. When fed into the state-action pair processing circuitry, state-action spikes are produced with one issue that must be resolved. In previous experiments, it was always assumed that the same state would not occur consecutively with the same action. Here, that does occur during training. In those instances, a state-action pair is not correctly produced. Only a repetition of the state spike is produced then.

After removing the spatial navigation environment simulation, some items of neuronal circuitry that controls minicolumn input and output during the testing phase of the experiment require adjustments:

  1. The event sequencer marked "enable-output-and-retrieval" is reprogrammed to enable control by minicolumn output of go/no-go actions as well as minicolumn retrieval elicited by desire for reward at the onset of the testing phase in the Thorpe task (currently t=3125 ms).
  2. The connection table "goal" that routes spike input to the minicolumns that elicits desire for reward during retrieval is edited to target minicolumn six, since reward is the third state and states are represented by minicolumns four and higher.
  3. The output of the JoinCloneRelay that combines "STM clear" signals (which are also action clear signals) with action control output produced by the minicolumns is sent to the vrat4dircontroller, a device captured in Catacomb that maintains spike trains for action control in accordance with single control spikes. Thus, the output of prefrontal minicolumns controls go/no-go actions during the testing phase of the Thorpe task.

Integrating environment output during the task with the State-Action pair generating circuitry

Once the simulation of the environment in the Thorpe task is constructed, five different output spike trains are available for the activation of prefrontal minicolumns in the neuronal simulation: stimulus 1 perceived, stimulus 2 perceived, reward received, go action taking place, no-go action taking place. These spike trains are fed into the neuronal circuitry that was designed to generate state-action spike pairs in the spatial navigation task of minicolumn-layers-aWM.20030904.ccm. Figures 2 shows the spike trains during the training phase of the task, while figure 3 shows the corresponding spike pairs that are sent to the prefrontal cortex.


Figure 2: Spike trains produced during training in the Thorpe task simulation. Spikes with index 0 and 1 represent perception of stimulus 1 and 2 respectively. Spikes with index 2 represent reward that is received. The reward is delayed so that preceding state-action spikes achieve encoding in prefrontal minicolumns before reward is encoded. The reward is given in response to the combination of stimulus 1 and a go-action (spikes with index 3). Spikes with index 4 represent no-go action.


Figure 3: Action-state spike pairs generated in specialized neuronal circuitry for spike trains representing states, perceived stimuli and reward received, and representing actions, go/no-go. Not every spike train causes the generation of a new state-action pair. And there is no reaction to some spike trains, such as the reward spikes. These are issues that must be resolved.

There are three probable causes for the state-action pair discrepancies: (1) Only go action is forwarded to the spike pair generating circuitry. (2) The duration of each condition is too short for the time required for encoding in minicolumns, the time to which spike pair generation is tuned. (3) The reward spike train has a different spike frequency, which may affect its perceived salience at the input to the state-action spike pair generating circuitry. (4) The protocol of changes in the state and action spike trains is somewhat different than the protocol used in the spatial navigation task. There are breaks between spike trains corresponding to stimulus presentations and on some occasions both state and action change simultaneously.

Cause (1) is dealt with easily by forwarding both go action and no-go action, picking the correct connection router of output from vrat4dircontroller as the source for action input. Figure 1 includes this update and shows the correct output of go/no-go spike trains.

To prevent an effect as in (2), the stimulus protocol is changed so that every condition has a minimum three theta cycles to be established as a spike to the prefrontal minicolumns. Also, the event sequencer that controls clearing of the STM buffers in the minicolumns is set to provide STM clear signals between each set of data during training and testing. The state-action pair generating circuitry may also need to be modified to deal with the condition where reward is perceived while a stimulus is still available. In previous experiments, only one state (places and goal) was available at a time. The sequence we wish to produce is: stimulus STATE - go/no-go ACTION - reward STATE. There is no need for another action spike following the reward state spike.

  1. stimulus STATE : 3 cycles = 375 ms (+ stay on during go/no-go and reward).
  2. go/no-go ACTION: 3 cycles = 375 ms, offset 375 ms from stimulus STATE (+ stay on during reward).
  3. reward STATE: 3 cycles = 375 ms.
  4. clear: 1 cycle = 125 ms.

The second part of the state-action pair spikes, 375 ms delayed, for some reason include a repetition of the state spike. Also, another action spike appears before that, on its own at t=500 ms. The initial appearance of no-go prior to the first stimulus causes a spike within the "single-spike-per-event" block of the action part of the state-action pair generating circuitry around t=120 ms. Still, the state part first produces a spike around t=248 ms, which is just after the first spikes appear in that part.

The common "new-event-detector" that responds to both state and action streams is triggered by the action event around t=120 ms. Yet, at that time there is no state input, so that no state spike is produced. A delay of 375 ms (3 cycles) causes an action spike to be produced around t=500 ms. That explains the lone action spike. Solution: Do not produce no-go action spikes before stimulus spikes begin.

The repetition of the state spike after t=500 ms, together with the action spike, is a coincidence of two parts of the process. There is a new action event at t=500 ms, since there is a switch to go-action. That is detected and causes a new state spike. At the same time, the 375 ms delayed new event detection from the previous state event reaches the action stream and produces a go-action spike. Solution: Do not switch the action at the same time as the action spike is produced. To produce the correct action spike, that action must be present, so it can only be switched on earlier. Of course, since it will still cause a new event, the only way to avoid a second state spike is for the go-action to start at the same time as the stimulus presentation.

The two solutions are tested for the first state and action spikes in minicolumns-thorpe-task.20031220b.ccm. They produce the desired STATE-ACTION spike pairs. But, it is not satisfying to have to start go-action at the same time as stimulus presentation, since in reality there must be a delay before a go/no-go action decision is made and acted upon.

During retrieval, reward is sought correctly if the reverse spread meets activity in the minicolumn representing stimulus 1, even if there are preceding minicolumns in the chain of associations. This means that a data protocol may include initializing steps:

stimulus 1 STATE & no-go ACTION --- stimulus 1 STATE & go ACTION --- reward STATE
+-----------------------------+     +--------------------------+     +----------+
          375 ms                               375 ms                    375 ms
The two remaining issues are:
  1. The changes occur at the exact same time as the delayed action spike generation. The following protocol avoids this problem:


    Figure 4: This circuitry produces a state spike for a state-action spike pair that represents only those portions of the state vector that have changed most recently. When a new state is detected, that new state spike triggers interneuron activity that clears previous state spikes from a buffer. That buffer is refilled with the new state spike. The buffered spikes are used as the state-spike when a new state-action spike pair is generated .No-go action can be considered present before stimulus 1 is presented, since the subject was not active. This can be simplified by starting the no-go action at the same time as stimulus 1 is presented. A plausible explanation is that priming for stimulation causes a theta (re)start so that buffers in both state and action streams are cleared and then filled at the same time. To improve the speed of the simulation, the protocol can be simplified by focusing on the two necessary associations (circled in red). The protocol before the dashed vertical red line can be omitted, taken as a given, by presenting stimulus 1 and go action at the same time. There are two issues to bear in mind when using this faster protocol: (1) Immediate action (in the same millisecond) does not look as realistic. (2) The [stim.1,no-go] and [no-go,stim.1] associations will not be learned. Will criticism of the experimental task arise over these two issues?

  2. The environment will continue to produce stimulus 1 perceptual input at the same time as reward is received. It is not clear if this poses a problem. Allowing these states to coexist is an interesting experiment in its own right. To avoid it, the following circuitry may be implemented as an improvement of the existing state-stream in the state-action spike pair generating circuitry:


    Figure 5: This circuitry produces a state spike for a state-action spike pair that represents only those portions of the state vector that have changed most recently. When a new state is detected, that new state spike triggers interneuron activity that clears previous state spikes from a buffer. That buffer is refilled with the new state spike. The buffered spikes are used as the state-spike when a new state-action spike pair is generated. The interneuron population is presented with dashed lines, since it may be implemented implicitly. The simulation requires less computation if inhibitory connections lead directly from the detector to the buffer.

The frequency of reward spikes is increased so that they also generate a new state.

As shown with minicolumns-thorpe-task.20031223.ccm, better spike pairs are now produced in groups for each training presentation. Unfortunately, a new arrangement for the application of reward if stimulus 1 and action are present together required a spike delay from action to the conditional spiking circuit that gates reward. This creates the correct reward application after 750 ms, but also leads to some false positives when the next stimulus state appears, but the previous action is still in the spike buffer. Instead of a spike buffer, another gating circuit is needed. Thus, reward spikes are produced only if (1) stimulus 1 is present, (2) go-action is taken, and (3) the time is between 750 ms and 1125 ms after the onset of the stimulus presentation.

If go-action represents licking a tube that supplies juice as a reward then it makes sense that every go-action spike should also result in a reward spike. Those spikes must be gated by the presence of stimulus 1 (and later by stimulus 2 when a reversal takes place). Those two aspects are already present in the circuitry. Now, the go-action spikes should no longer be buffered, but instead they should be gated in a manner so that the gate opens 750 ms after the onset of a stimulus presentation and closes when the spikes to clear STM buffers are given between trials. Circuitry that achieves the desired protocol for reward spikes is added in minicolumns-thorpe-task.20031229.ccm and shown in Figure 6.


Figure 6: Updated circuitry to produce the desired reward spikes in the interval between 750 ms and 1125 ms after the onset of a stimulus presentation for which go-action is rewarded. A vector switch (1) selects the first stimulus (2) or following reversal the second stimulus (3) as the stimulus for which go-action is rewarded. A single spike selects go/no-go action that is maintained as a continuous spike train in the ``vrat4dircontroller'' circuitry (4). That single spike is also transformed to a spike with index 1 in a connection router (5) and subsequently buffered (6) for 750 ms. When the delayed spike arrives at a vector switch (7), a vector output representing the perception of the stimulus for which go-action is rewarded may propagate to a ``conditional-spike-gate'' neuronal circuit (8). There the value of the stimulus perception vector controls transmission of a train of go-action spikes through synapses that elicit a train of reward spikes in the gating neuron. Together, the spike trains indicating the perception of the first stimulus (9), the second stimulus (10) and reward received form the state input to the state-action spike pair generating circuitry. The reward spike train ends as a spike with index 0 (received through a connection that is not drawn in this figure) clears short-term memory buffers throughout the system, as well as the ``vrat4dircontroller'' buffers and resets the vector switch (7) so that a constant value 0 suppresses transmission through the ``conditional-spike-gate'' neuronal circuit (8).


Figure 7: The resulting state-action spike trains. The indices of the spike trains represent the following: (0) go, (1) no-go, (2) stimulus 1, (3) stimulus 2, and (4) reward.


Figure 8: Spike pairs generated by the state-action spike pair generating circuitry. The indices of the spikes represent the following: (0) go ACTION, (1) no-go ACTION, (4) stimulus 1 STATE, (5) stimulus 2 STATE, and (6) reward STATE. Vertical blue lines were added to the spike plot to indicate the six different training sets.

If it is more desirable that reward immediately appears when the correct stimulus and go-action are combined, then the delay that allows the previous association between spike pairs to be encoded with LTP may be implemented in a more complicated version of the state-action spike pair generating circuitry.

Using prefrontal minicolumn output to guide go-actions during task performance

If the output of the minicolumns is to guide behavior during the task then LTP established during the training phase must encode the correct associations and minicolumn activity must be correctly interpreted as output during retrieval. We first tested the results of training with minicolumns-thorpe-task.20040105.ccm. The following tables describe the associations that were learned through the strength of Wf and Wb connections between minicolumns.

  to r population
  gono-gostim.1stim.2reward
from
s
population
go   (5-6)(1)
no-go     
stim.1(1)(4)   
stim.2(3)(2)   
reward   (1-2) 

Table 1: Connections strengthened in Wf between the s and r populations of minicolumns. Strengthened connections are indicated by bracketed numbers that represent the first training set in which the association is learned (see Fig.8). Where two numbers appear between the brackets, associations are learned in the transition between two training sets, spurious but non-problematic associations that may be removed by more thoroughly clearing buffers between training sets. The associations that are needed to achieve the task are highlighted in blue with bold font. The strengthened connections listed by numerical indices are Wf:{4,5,12,13,40,46,48}.

  to x population
  gono-gostim.1stim.2reward
from
y
population
go  (1)(3) 
no-go  (4)(2) 
stim.1     
stim.2    (1-2)
reward(1)    

Table 2: Connections strengthened in Wb between the y and x populations of minicolumns. The notation is identical to that in the previous table. The strengthened connections listed by numerical indices are Wb:{6,32,33,40,41,53}.

The remaining associations that must be learned so that forward and backward spread of activation can be used to retrieve the stimulus upon which to act in order to received reward are two synapses in the connection matrices Wif and Wib. In Wif, the connection between stimulus 1 and reward in the go minicolumn must be strengthened, i.e. Wif{4 to 6}. In Wib, the connection between reward and stimulus 1 in the go minicolumn must be strengthened, i.e. Wib{6 to 4}. Inspection with a Catacomb ObservationRecorder shows that the required synapse in Wif is strengthened. But that is not shown for Wib! There are two possibilities: Either the association was not learned, or the known bugs in the ObservationRecorder do not allow its inspection. (In current versions of Catacomb, the ObservationRecorder displays only a subset of the columns for connections between pre- and postsynaptic neuronal populations that result in more than one column.)

Further inspection may show if Wib was trained correctly after-all. During the retrieval phase, activation in the reward minicolumn should propagate to the x population of the go minicolumn. And if Wib was correctly trained, that should result in x population activity at the neuron that enables backward propagation to the stimulus 1 minicolumn. Note that no output from prefrontal minicolumns currently appears during the retrieval phase, but that may be due to other problems in the output circuitry..

Buffer clearing may be improved, as demonstrated for the transition to the performance part of the task. There, I added a number of clear signals in minicolumns-thorpe-task.20040105b.ccm to insure that the last buffered action is cleared from the a-STM-buffer population around t=7500 ms. Similar modifications may improve clearing of buffers between training sets.

The inspection, taking a reward retrieval spike as the onset, shows the following: At around t=8015 ms, the go neuron of the a population spikes for retrieval. That causes spiking in y{48-55}, i.e. all neurons of the y population in the reward minicolumn. Backwards spread through Wb result in a spike at x{6}, a neuron in the x population of the go minicolumn at around t=8022 ms. If associations were correctly encoded in Wib then y{4} should spike to allow further backpropagation to the stimulus 1 minicolumn. That spike does not appear! Consequently, either training of Wib was not successful or spiking in the y population of the go minicolumn was inhibited by activity in the a population (if go was erroneously receiving a "current state" signal during the retrieval phase). No spike appears at the a population of the go minicolumn around that time, so Wib was not successfully trained.

Solving the problem of learning in Wib:

Inspection of spiking in the y-specific population, the population of neurons used to train Wib, shows that only two spikes occur during training. Both occur at y{4}, one at t=1100 ms and one at t=1207 ms. During the training set responsible for those spikes, the only spike at x{6} is at t=1209 ms. Thus, there are two problems: (1) x spikes after y, while the presynaptic spike should precede the postsynaptic spike to elicit LTP, and (2) only one pair of x{6} and y{4} spikes is available for learning.

Since y-specific is driven by r2 output, I now inspect the filtered r2 output and the r-STM-buffer activity. It is notable that the entire buffer spikes each time a buffer-clear signal is received between training sets. That clearing spike appears in all minicolumn implementations to date that include buffer clear signals. It is an artifact of the implementation. The intended way to clear a STM buffer is to revert to a state without theta rhythm. The current attempt to implement this as an actual change in the rhythmic modulation sends a clear signal as a short high-frequency spike train into the spike relay that distributes rhythmic spiking throughout the minicolumn neuronal circuitry. The high frequency spike train causes successive hyperpolarizations at STM buffers. There may be cause to improve this implementation of the ability to clear STM buffers.

It appears that the buffer is cleared too soon after reward is received. The first reward spike in the a population at t=1102 ms and the first clear signal is sent to the minicolumns at t=1250 ms. It was originally thought that the first reward spike would immediately participate in learning and that three cycles would be available before buffers are cleared. That protocol did not take into account several aspects of the mechanism that introduce delays:

  1. The first spike in the a-STM-buffer is suppressed by transmission modulation on the output to the a-population, since its phase could cause interference with retrieval spikes and other phase dependent activity.
  2. Meanwhile, r is driven by s during training, so that input from the stimulus 1 minicolumn produces activity in the r population of the go minicolumn. Another cycle passes before usable r2 output becomes available, since the new item in the r-STM-buffer must shift into the proper phase (see detailed explanation below). And another cycle passes before r2 output is accepted as y-specific activity during training phases (as explained below).
  3. Reward spikes appear in the a-population starting at t=1101 ms, then again at t=1201 ms and onward. That produces y-diffuse spikes at t=1106 ms, t=1205 ms and so forth in the reward minicolumn. Only one x{6} spike appears at t=1209 ms in the go minicolumn, although seven spikes were produced in the reward minicolumn. Spiking in the x population depends on the strength of the connection from the reward minicolumn, as trained in Wb. The first round of such training is accomplished only after t=1200 ms, when the onset of diffuse input from the a population precedes a rapidly following input from the y population of the reward minicolumn. The interval between a-diffuse input and the first y-diffuse input from the reward minicolumn was too great, so that it does not cause spiking at x{6} in the go minicolumn. Even though reward continued to spike seven times, Wb alone was not trained to sufficient strength to activate x{6} without the diffuse contribution by the a population in the go minicolumn, which ended at the clear-buffer signals around t=1250 ms.

It also seems that the a-STM-buffer is not cleared by the clear signal, at least the a neuron of the reward minicolumn continues to spike.

So, is the problem (1) simply that more time is needed after reward is received so that LTP is established in Wib, or (2) that the r2 output is more broken? The second possibility is tested by investigating the following questions:

In any case, the first usable x{6} spike appears at t=1209 ms, so that more time is needed to train the x{6} to y{4} association in Wf, despite correcting the previously broken r2 output. The data protocol is now adjusted to deal with problematic aspects of minicolumn activity. The periods in which theta rhythm is removed by clear-buffer signals is extended between the training sets to insure that all buffers are cleared, including the a-STM-buffer. The timing of clear buffer-signals and following training sets is also shifted to provide additional time for encoding after training sets that include reward activity. This is accomplished in minicolumns-thorpe-task.20040107.ccm.

Although more time is now provided for learning when reward is received, only two x{6} spikes appear, since the a population diffuse contribution ends as the a-STM-buffer receives new ACTION input three cycles after the reward STATE input. The first x{6} spike does not precede the corresponding y-specific{4} spike (because the phase of r2{6} was still shifting in r-STM-buffer), so that no LTP is established, but the second x{6} spike at t=1318 ms does precede the corresponding y-specific{4} spike at t=1322 ms. Thus there is one update of the connection strength. The update of Wib is observed in the ObservationRecorder for synapses from the x population to the y-specific population.

The single update produced a mild strengthening of the connection between x{6} and y-specific{4}. More such updates are possible by (a) increasing the number of cycles further and (b) increasing the delay from r2 to y-specific. The first solution requires that the delay between the STATE and ACTION spikes in a spike pair is increased by at least one more rhythmic cycle, and that the appearance of reward is similarly delayed to allow the association from stimulus 1 STATE to go ACTION to be encoded. This is non-problematic in the context of the Thorpe task, since Thorpe provided stimuli and reward over greater durations than these. But to simplify the simulation, I will first attempt the modification of delay from r2 to y-specific, increasing it by 4 ms to 15 ms. This is done in minicolumns-thorpe-task.20040107b.ccm, and encoding in Wib is successful. During retrieval, the presentation of the desire for reward causes the retrieval of go-action when stimulus 1 appears. Note that encoding is also successful with the original delay of 11 ms, since the rewarded training set appears twice during training.

An important design principle for integrate-and-fire models with persistent firing STM buffers was found here: The buffers need several cycles to shift items into the phase at which their reactivation is maintained in STM. If other mechanisms in a neuronal simulation depend on the specific phase of spikes in a buffer, they may need to wait several cycles to perform their function. The time taken to complete the simulation successfully increases proportionally. Two examples are:
  1. a-STM-buffer: causes x to activate at progressively earlier phases over consecutive cycles until x activates earlier than y-specific so that Wib update is achieved.
  2. r-STM-buffer: as an item shifts into the first item phase it is recognized as a valid r2 output that causes spiking in y-specific for Wib update.

Results of task performance without reversal

As Figure 9 shows, before the reversal of the rewarded stimulus, the neuronal simulation correctly drives go action each time stimulus 1 appears after t=8000 ms. Notable is the absence of any no-go actions. No-go is not explicitly driven, since it was not associated with reward in any training set and its stimulus associations are therefore not retrieved by the desire for reward.


Figure 9: Thorpe task simulation performance without reversal with minicolumns-thorpe-task.20040108.ccm. Black rectangles are trains of many spikes. Spike indices indicate the following: (0) go action, (1) no-go action, (2) stimulus 1 present, (3) stimulus 2 present, (4) reward. Training takes place between t=0 ms and t=8000 ms. After t=8000 ms, the trained neuronal simulation of the prefrontal minicolumns drives go and no-go actions in response to stimuli perceived. Each time stimulus 1 appears, the prefrontal minicolumns drive go action after a brief delay. Before t=13120 ms, that go action is rewarded. After that time, the environment simulation reverses the reward protocol. Since reversal is not dealt with at this stage of the model construction, the neuronal simulation continues to drive go action in response to stimulus 1.

How may the task reversal be learned?

The Thorpe task repeats the learning and task performance above for many different stimulus pairs. Each time, frequent reversals of the task take place after the initial task has been learned. Reward is given for go action with the previously unrewarded stimulus. The subject monkey/rat learns to reverse its behavior quickly. How may this be achieved?

One assumption we can make is that the ability to learn rapid reversal is hippocampus dependent. This assumption is based on the knowledge that the ability to encode episodic context dependent memory after a single presentation is hippocampus dependent. Such episodic memory enables continuous encoding that keeps track of the stimulus state and go action combination most recently associated with reward. The episodic memory must then be used to achieve the reversal of the behavior learned with cortical minicolumns.

As suggested in recent hippocampal modeling work, the temporal context can be stored during task performance through mechanisms in dentate gyrus and hippocampus. For each trial (temporal context), an association between stimulus state, action and a possible reward is encoded. The reward representation there may be strongly connected to the reward minicolumn. Also, in dentate gyrus the most recent associations may be more strongly connected to the current temporal context. When the desire for reward appears, these connections may result in retrieval of the most recently rewarded state-action context encoded in the hippocampus.

Learning to rapidly reverse behavior may proceed as follows:

  1. During training with the initial set of stimuli, the simple association of a specific stimulus and go action with reward is learned.
  2. When the reward disappears for the first time, new exploration is prompted. Go action is attempted with the other stimulus.
  3. When go action with the other stimulus is rewarded, the learned behavior is switched to the other stimulus. Desire for reward then causes retrieval of the other stimulus and go action. (How?)
  4. The concept of reversal is understood for following sets of stimuli. As soon as reward disappears, a reversal of behavior takes place. (How?)
(Note that necessary "if not" logic can be implemented using inhibition and the removal of inhibition.)

One way to achieve the intrgration of the reversal behavior in the learned associations is shown in Figure 10. Note that it presumes that the possibility of reward for go action following either stimulus is encoded.


Figure 10: A neuronal circuit and protocol for reversal learning in the Thorpe task. Initially, an association is learned from one stimulus to go action and reward. When reversal firt occurs, the same has to be learned for the other stimulus. Simultaneously, the hippocampus retains episodic memory of previously rewarded state-action pairs. When a reversal occurs, that is also encoded in episodic memory and a corresponding minicolumn is activated. The activation of that minicolumn becomes assocated with go action and reward for the other stimulus. Thus, a secondary route appears and replaces the direct route to reward. This new route depends on hippocampal activity, so that rapid reversal behavior is encoded. Since this approach depends on the weakening of one set of associations as another set is encoded, the simulation requires an implementaiton of long-term depression.


~/doc/html/minicolumns-thorpe-task.html - Tue Mar 9 09:19:41 EST 2004 - Randal A. Koene