Minicolumns: The W.Schultz Task

Up: Minicolumns: Path Selection and Planning

Previous: Minicolumns: The Thorpe Task

Contents:

Task Protocol

Figure 1 depicts the protocol of the W.Schultz task. In the original task, monkeys were trained with the protocol in which a couple of seconds elapsed during each step.


Figure 1: The protocol of the W.Schultz task. Solid arrows indicate the possible steps taken within a trial. Dashed arrows indicate possible transitions between consecutive trials.

Minicolumns required for this task are the following:

Modifications of the Task

Some modifications are made so that simulations can be performed in a manageable computation time.

During the training presentations, randomized go/no-go actions are simulated. During testing, go/no-go actions are controlled by the neuronal simulation.

Model Construction

Simulating the environment of the W.Schultz Task

The implementation is a modification of a Thorpe task simulation minicolumns-thorpe-task.20040108.ccm. A third stimulus (represented by a sound source) is added to the environment and connected with the event sequencer responsible for the protocol of stimulus presentations. While a more complicated environment circuit would be necessary to achieve the conditional protocol of the W.Schultz task, in which the appearance of a rewarded trial depends on correct performance of an unrewarded trial (when such a trial appears), random responses during training are currently simulated in a scripted manner. Thus, it is known when the response to an unrewarded-move trial will be correct and when it will be incorrect during training. To test training, the subsequent repetition of the unrewarded-move trial or the presentation of a rewarded trial is hard-coded in the event sequencer. During the task performance stage of the experiment, the more complicated circuitry may be needed if incorrect responses occur.

The task reversal mechanism that was included for the Thorpe task is removed. These modifications are implemented in minicolumns-schultz-task.20040114.ccm. Since, reward is now the fourth state, the connection table "goal" is updated accordingly, so that retrieval can activate the reward minicolumn when a desire for reward is generated. A second route is added to the reward generating circuitry, so that non-move is rewarded when the rewarded-non-move stimulus is presented.


Figure 2: Reward generating circuitry in the W.Schultz task. Reward may be given six rhythmic cycles after the correct action follows presentation of a rewarded-move or rewarded-non-move stimulus. Trains of reward spikes elicited by action spike trains are gated by the presence of the corresponding stimulus. Additionally, stimuli can only control transmission to the gating neurons 750 ms after a move (go) or non-move (nogo) action is commenced. Stimulus presentations are again relieved of Control of the gating neurons once buffer-clear signals appears between trials.

The following table lists a proposed pseudo-random set of training trials that enable all relevant associations to be learned.

trial start cyclecycle offsetcycles new astimuluscycle offsetcycles new aactioncycle offsetcycles new aconsequencecycle offsetinter-trialnew associations
10 (125ms)0-2unrewarded-move0 (125ms)3-5nogo6 (875ms)clearunrewarded-move to nogo
80 (1000ms)0-2unrewarded-move0 (1000ms)3-5gounrewarded-move to go
140 (1750ms)0-2rewarded-move0 (1750ms)3-5go6 (2500ms)6-10reward11 (3125ms)cleargo (from unrewarded-move) to rewarded-move to go to reward
260 (3250ms)0-2rewarded-move0 (3250ms)3-5nogo6 (4000ms)clearrewarded-move to nogo
330 (4125ms)0-2unrewarded-move0 (4125ms)3-5go
390 (4875ms)0-2rewarded-non-move0 (4875ms)3-5nogo6 (5625ms)6-10reward11 (6250ms)cleargo (from unrewarded-move) to rewarded-non-move to nogo to reward
510 (6375ms)0-2rewarded-non-move0 (6375ms)3-5go6 (7125ms)clearrewarded-non-move to go

Table 1: A minimal set of training trials to encode all relevant associations. Each cycle has a period of 125 ms, so that the minimum amount of simulated time needed during training is 7250 ms (58 cycles). If more than one presentation of each association is needed that time increases accordingly. Absolute times indicated in milliseconds in the second column are entered into the stimulus event sequencer together with the stimulus index. Absolute times in the fifth column are entered into the action training event sequencer together with the action index. Absolute times in the eigth column are automatically computed by the reward generating circuitry. And absolute times in the eleventh column are entered into the clear-buffer event sequencer with several repetitions at 10 ms intervals. This training protocol is implemented in minicolumns-schultz-task.20040115b.ccm.

Integrating environment output during the task with the State-Action pair generating circuitry

In the simulation of the Thorpe task, stimulus presentations ended after a fixed interval determined by a secondary link from the event sequencer via a delay buffer. Here, the duration of individual training trials differs as a way to minimize the total computation time of the simulation. For that reason, the secondary pathway with delay buffer is replaced with circuitry that ends stimulus presentation when a signal is received from the clear-buffer event sequencer.

The Catacomb stimulus objects activate and deactivate in response to the same spike index, a toggle switch. Thus, the onset of a stimulus presentation must set up switches such that a clear-buffer signal is interpreted as the corresponding spike index. Also, the toggle switch must receive only the first clear-buffer spike. That second condition can be met either by propagation through a neuronal population with after-hyperpolarization or through further switching circuitry. Switching circuitry is as desirable as a neuronal solution, since this construct belongs to the simulation of the environment task protocol.

There is currently a Catacomb related problem with the control of transmission modulation that affects task environment circuitry intended to signal the end of stimulus presentations during trials (minicolumns-schultz-task.20040115c.ccm). In minicolumns-schultz-task.20040115d.ccm the problem is temporarily avoided by manually coding those signals.

Figure 3 shows the spike trains during the training phase of the task, while figure 4 shows the corresponding spike pairs that are sent to the minicolumns of the prefrontal cortex model.


Figure 3: Spike trains produced during training in the W.Schultz task simulation. Spikes indices represent: (0) go, (1) nogo, (2) rewarded-move stimulus, (3) rewarded non-move stimulus, (4) unrewarded move stimulus (5) reward. Notice that reward appears immediately in the third trial. The simulation circuitry includes a 750 ms delay of reward compared to the onset of action, but the action was not interrupted by a clear-buffer signal between the second and third trials. A similar issue appears between the fifth and sixth trials.


Figure 4: Action-state spike pairs generated in specialized neuronal circuitry for spike trains representing states, perceived stimuli and reward received, and representing actions, go/no-go. There are missing stimulus state spikes and no reward state spikes. Some of these problems may be caused by the issues noted in Figure 3.

In order to correct the onset of the reward spike train, the input connection that provides 750 ms delayed switching spikes is no longer connected to action events, but rather to the output of the event sequencer that specifies the start of each stimulus presentation. The router connection table is updated accordingly.


Figure 5: Spike trains produced during training in the W.Schultz task simulation. Spikes indices represent: (0) go, (1) nogo, (2) rewarded-move stimulus, (3) rewarded non-move stimulus, (4) unrewarded move stimulus (5) reward. Spike trains representing reward received is now correctly delayed 750 ms after the onset of a training trial.


Figure 6: Action-state spike pairs generated in specialized neuronal circuitry for spike trains representing states, perceived stimuli and reward received, and representing actions, go/no-go. The spike indices represent: (0) go ACTION, (1) nogo ACTION, (4) rewarded move stimulus STATE, (5) rewarded non-move stimulus STATE, (6) unrewarded move stimulus STATE and (7) reward STATE.

Inspection of the a population activity in the minicolumns shows whether all associations listed in the last column of Table 1 appear together within an interval that can elicit LTP.


Figure 7: Activity of the minicolumns a population during training in the W.Schultz task. Membrane potentials plotted at the top represent states, while those plotted at the bottom represent actions. States are color coded as follows: (BLUE) unrewarded move stimulus state, (RED) rewarded move stimulus state, (CYAN) reward state and (GREEN) rewarded non-move state. Actions are color coded as follows: (MAGENTA) nogo action and (BROWN) go action. At some time, each of the associations listed in the last column of Table 1 appears in the a-STM-buffer in correct order.

Using prefrontal minicolumn output to guide go-actions during task performance

First we will test the establishment of LTP at the correct synapses, then we will test retrieval in the presence of a desire for reward. Unfortunately, the Catacomb 2.119 ObservationRecorder is still unable to display synaptic weight matrices connecting multiple pre- and postsynaptic neurons. The matrices Wb and Wf are unidimensional and therefore can be inspected with ObservationRecorders. The most direct way to inspect Wib and Wif is to provide driving spikes to the presynaptic populations (x and r) and to observe the response at the postsynaptic populations (y and s). Such inspection is done in multiple simulation runs, since the driving spikes cause further effects in the minicolumn network after their initial propagation to a postsynaptic population. These tests are performed in minicolumns-schultz-task.20040116b.ccm.

  to r population
  gono-goSrmSrnmSurmreward
from
s
population
go  (3)(6) (3)
no-go     (6)
Srm(3)(4)    
Srnm(7)(6)    
Surm(2)(1)    
reward      

Table 2: Connections strengthened in Wf between the s and r populations of minicolumns. Strengthened connections are indicated by bracketed numbers that represent the first training set in which the association is learned (see Fig.6). The associations that are needed to achieve the task are highlighted in blue with bold font. The strengthened connections listed by numerical indices are Wf:{4,5,6,12,13,14,32,40,56,57}. While LTP appears to be elicited at the correct synapses, it is possible that the strength achieved after a single batch of training presentations is insufficient for successful retrieval. It is also possible that propagation needed for encoding in Wif is not achieved with a single batch of training presentations.

  to x population
  gono-goSrmSrnmSurmreward
from
y
population
go  (3)(7)(2) 
no-go  (4)(6)(1) 
Srm(3)     
Srnm(6)     
Surm      
reward(3)(6)    

Table 3: Connections strengthened in Wb between the y and x populations of minicolumns. The notation is identical to that in the previous table. The strengthened connections listed by numerical indices are Wb:{4,5,7,15,32,33,40,41,48,49}.

By driving the relevant neurons in the x population with an event sequencer in multiple simulation runs, two tests at a time (at t=7300 ms and t=7350 ms), the following results were obtained for encoding of Wib after a single batch of training presentations:

The single batch of training presentations was probably insufficient to create strong LTP and to establish LTP in Wib and Wif. One way to improve this is to increase the learning rate. A more plausible solution is to provide more Hebbian spiking at the relevant connections. That can be accomplished by running another batch of stimulus presentations or by allowing each pair of spikes to persist longer in the initial batch of stimulus presentations. Longer persistence requires only a few additional cycles per item, while another batch would require many more cycles. For reasons of computational efficiency I therefore opt to increase the intervals between successive items. For this, the data protocols in event sequencers as well as several delay buffer parameters are modified.

The first modification applied in minicolumns-schultz-task.20040116b.ccm increases the interval between state and action spikes in each state-action spike pair, thereby providing additional time for encoding. Modifications:

Additional time is also needed after the last spike in a tial appears, and after an action spike appears. All the necessary changes in the timing of trial presentations, trial ends, actions, and clear-buffer signals are done in minicolumns-schultz-task.20040120.ccm. The modified spike trains and spike pairs are shown in the following figures.


Figure 8: Spike trains produced during training in the W.Schultz task simulation. Spikes indices represent: (0) go, (1) nogo, (2) rewarded-move stimulus, (3) rewarded non-move stimulus, (4) unrewarded move stimulus (5) reward.


Figure 9: Action-state spike pairs generated in specialized neuronal circuitry for spike trains representing states, perceived stimuli and reward received, and representing actions, go/no-go. The spike indices represent: (0) go ACTION, (1) nogo ACTION, (4) rewarded move stimulus STATE, (5) rewarded non-move stimulus STATE, (6) unrewarded move stimulus STATE and (7) reward STATE.

The resulting strengthened connections in Wb and Wf are the same as in tables 2 and 3 above, with one additional strengthened connection in each, representing an unexpected association from nogo to Srnm. This is probably caused by an error in the clear-buffer signals around t=9750 ms that do not correctly clear the action maintaining circuitry. Adding some more clear-buffer signals after t=9750 ms removed the undesired association between trials in minicolumns-schultz-task.20040121.ccm. A nogo action spike still appears sfter reward just before the end of the sixth trial. That spike could be removed by starting the clear-buffer signals a little earlier, but it is allowed to occur, since that is not in conflict with the method of encoding state-action pairs in the W.Schultz task.

Results of task performance


Figure 10: Reward is successfully obtained during performance of the W.Schultz et al.. go/nogo task with the encoded minicolumns model of prefrontal cortex in minicolumns-schultz-task.20040121b.ccm. As noted in the plot of spike trains, spike indices represent the following: (0) go action performed in response to minicolumn output, (1) nogo action performed in response to minicolumn output, (2) rewarded-move stimulus perceived, (3) rewarded non-move stimulus perceived, (4) unrewarded move stimulus perceived, (4) reward obtained. Training ends at t=11375 ms. Thereafter, a desire for reward exists and each of three possible trial conditions is given (the third appears twice, since its successful performance can randomly lead to two different rewarded trials). At t=11750 ms, the rewarded move stimulus appears. Shortly thereafter, retrieval in the network of minicolumns correctly causes a go action response. That response is then rewarded. Similarly, rewarded non-move stimulus at t=13750 ms elicits the correct nogo action. The unrewarded move stimulus appears for the first time at t=15750 ms. Since retrieval causes the desired go action, a rewarded move trial follows at t=17750 ms. Finally, the other possible rewarded conclusion of a successful unrewarded move trial is demonstrated at t=19750 ms, namely a following rewarded non-move trial at t=21750 ms.


Figure 11: Spikes in the a population of minicolumns during performance of the W.Schultz et al.. go/nogo task. Spike color codes indicate the following minicolumns: (RED) rewarded move, (GREEN) rewarded non-move, (BLUE) unrewarded move, (CYAN) reward, (BROWN) go and (MAGENTA) nogo. The continuous presence of reward spikes (CYAN) during task performance represents the ever present desire for reward in terms of reward minicolumn activation during retrieval phases. When the perception of reward obtained activates the reward minicolumn during encoding phases at the end of successful trials denser clusters of reward spikes appear. When compared with the onset of go/nogo action spikes in figure 10, activity in the a population of go/nogo minicolumns is delayed by a few cycles. This is caused by time taken during operations in the loop from the minicolumn output population activity that causes go/nogo action to spiking at the new item phase of the a-STM-buffer that is propagated to the a population.

Note:
In minicolumns-schultz-task.20040122.ccm, results displayed unusual gaps in the action spike train (and therefore the dependent reward delivery spike train) after t=13410 ms after t=15415 ms. The gaps are shown in the figure below. Inspection reveals that the minicolumn output population that drives action spike trains during task performance experiences a single spike at cells 56 to 63 around those times. Those cells belong to the reward minicolumn. Further investigation is needed to discover why this happens, and why it happens only so sporadically.


Figure 12: Undesired gaps show up in the action spike trains after t=13410 ms and t=15415 ms. Since the spike train of reward delivery in the environment simulation depends on the action spike train, the gaps show up there as well.

Comparison with experimental results published by W.Schultz et al.


(follow link for full size)

Figure 13: Graphs depicting the activity of three orbitofrontal neurons (A, B and C), as excerpted from W.Schultz et al., Cerebral Cortex (2000). Rows of dots represent activity of the neuron during different trials. Summation of dots in each column results in histograms shown. The time axis is shown in seconds.

Model correspondence with results in fig.13a:
Activity in (A) may correspond to the activity of neurons in Srm, Srnm and Surm minicolumns (in a, x, y, r or s populations).

Model correspondence with results in fig.13b:
The neuron may correspond to a neuron in the go/nogo minicolumns of our model (in a, x, y, r or s populations).

Model correspondence with results in fig.13c:
Reward neurons (in a, x, y, r or s populations)

(Keep in mind that we opted not to include "sound reward".)

There is a problem with the current simulation output! The behavior is correct in that go/nogo is controlled properly, but the go neurons are not activated with enough specificity.

PROBLEM:

During retrieval phases, r is being activated by y regardless of Srm or Surm trial type, which should not happen. This could be due to either too much strength in y->r connections or Wf connections during retrieval. The Wf option is not as likely, since activation due to forward propagation without sufficient gating probably would not lead to spikes in r from Surm when Srm is the current state. The r neurons that are active also match perfectly the y-diffuse neurons that are active. Note that Srm(Wf)->r(Go)=r(4), Surm(Wf)->r(Go)=r(6). And r receives input from y-diffuse and via Wf. It sends output to the r STM buffer, not driven by it. The y connection appears to be 2.5nS, which should be insufficient on its own. So there must be a contribution via Wf to r(Go) from both Srm and Surm to cause r(4) and r(6) to spike. That means that s(Srm)->Go=S(32) and s(Surm)->Go=s(48) must both be spiking. It looks like s(Srm)->Go, s(Surm)->Go and even s(Srnm)->Nogo are spiking during the Srm trial.

Q1: Is s being driven directly by x? Transmission strength from x to s is halved at the same phase as transmission strength to r via Wf. But that is half (or a little more than half when modulation is not at its lowest) of 6nS. The very strong connection exists, because x has to drive s during encoding. During Srm trial, false r(6) spiking needs support from y(6) and from s(48). y(6) depends on x(4) (from Srm) or x(5) (from Srnm). Even with end-stopping, x(5) can be activated by y(40) (in Srnm), since that is on another propagation path from the goal minicolumn. So, the y(6) contribution is both valid and present. s(48) [IS IT REALLY THERE? OR IS Y DRIVING?] in Surm cannot be driven by current state input during a Srm trial. s(48) should depend on both x(48) and on an r in the r(48-55) range, but no such Wif connection should exist, since nothing ever precedes Surm. The output shows no r(48-55) activity during the Srm trial, hence s(48) must be driven directly by x(48)! There is only one issue here: x should not be able to drive s during retrieval, sincex(48) depends on y(6) and y(6) activity was already explained above.

Q2: If Q1 is true, that explains why s(Srnm)->Nogo is spiking in response to propagation from the reward goal, but why does current Srm activity not cause end-stopping that prevents s(Surm)->Go spikes? Current state input is properly connected to the interneuron population that is responsible for end-stopping. And it appears to be modulated at the same phase as the y-specific to y-diffuse connection, i.e. to function during retrieval. And the end-stopping synapse is quite strong. End-stopping appears to be in place. End-stopping is probably not the issue, as explained for the case of y(6) activity in Q1.

Transmission modulation halves the input strength from Wf to r during retrieval (perhaps test this!). If not, s alone might drive r. SOLUTION: (1) extract the spike trains according to the responses they SHOULD exhibit in our model (2) mark the bug in the graphs that need to be changed, check all graphs (3) focus on completing the Results, Discussion and Abstract text (4) while correcting the text for submission according to comments, and while doing other tasks simultaneously, correct the model and its output and update the figures correspondingly (Due to the answer to Q1, perhaps the modulation of transmission from x to s needs to be down to 0.3 instead of 0.5. This solution is attempted, but not yet tested in minicolumns-shultz-task.20040401.ccm)

After changing the transmission modulation, behavior is still correct, but r(6) still spikes in the retrieval phases of an Srm trial. Even after reducing the minimum of the transmission modulation to 0, the problem remains. Either the transmission modulation is not being applied at the right phase, or something else is driving r(6) too strongly.

r(6) input ONLY from y(6) and s(48) s(48) input from x(48) and r(48-55) and current state input, but that is zero during a Srm trial r(48-55) input is a possible cause, but should be weak even then, check if it is zero during a Srm trial: those neurons never spike (which is correct) I can test if y(6) or x(48) are driving gated neurons directly by explicitly cutting other input (without explicit cutting:) both are active at the times when r(6) is driven during retrieval in a Srm trial (with explicit cutting:) if transmission modulation is not enough to control the amount of direct drive then I can also use modulation of the membrane potential

after cutting input other than x to s, x(48) and s(48) still spike during retrieval phases of a Srm trial and r(6) is still spiking in what seems the same way

also cut other input to r!!! then analyze what this means

when forward input to r is cut there is no more spike activity in r, so r is never driven entirely by y - thus y has the appropriate connection strength to r and the false activity in r(6) is dependent on s(48) but s(48) is still active, which is therefore driven directly by x(48) ~/src/nnmodels/ccmb/minicolumns-schultz-task.20040408.ccm.gz was used for this testing, although corrections should revert to minicolumns-schultz-task.20040401.ccm

Inspection of a and of the input that triggers retrieval shows that the retrieval spikes are the ongoing spikes in x(48), which disappear only once after t=12600 ms, when the newly appearing encoding spike is too proximate. The retrieval spikes fall just within the high portion of transmission modulation. The transmission modulation drops slightly too late. A reduction of the delay of transmission modulation by about 6-7 ms may solve the problem. Subsequently, the minimum level of the modulated transmission strength must be set to enable correct retrieval.

In minicolumns-schultz-task.20040420b.ccm, the delay of transmission modulation is reduced by 10 ms (actually, the delay is removed entirely). This modification affects transmission modulation of input via Wf to r and via Wif to s. This was successful, and r(4) and r(6) no longer produce the same spike trains. r(6) is now active only in encoding phases during the Srm trial.

Increasing the similarity of model output with that of the Schultz et al. results:
In order to further increase the visual similarity of the output, I will for the time being curtail forward propagation beyond the single step. That means that the forward propagation cannot be used for planning in this simulation, a feature that is not needed for the Schultz et al. task.

The approach adopted in minicolumns-schultz-task.20040711.ccm is to link to the r and s populations separate populations of interneurons (represented by a single interneuron component). Those experience population activity in response to one ore more firing r or s population neurons. The connectivity from the interneuron populations to the r and s populations is such that activity in one minicolumn sends inhibition to all other minicolumns. (In principle, the inhibition can simply target all minicolumns, so perhaps I can simply use such connectivity.) The inhibition is desired only during retrieval modes, or during the performance stage of the task. To do this, I do the following steps:

  1. Add an interneuronal population for r and one for s. Note that what we want now has less to do with winner-take-all responses than responses only in the minicolumn associated with current state.

    Both r and s populations already have links to interneuron populations. So, I will reuse those at present. I will convert the interneuron populations from representations with 8 cells to representations with a single cell, and I will replace the UseinvWa input routing to the interneurons and UseWa output routing from the interneurons with simple all-all connections. In order not to receive inhibition at this time from the STM buffers that are intended for future implementations of p and q populations, I will also cut the UseWa connections to r and s from the interneuron populations driven by those buffers. I have set the conductivity in the GABAergic synapses of r and s to 5 nS.

  2. Add all-all connectivity (test this and see if specific routing is needed).

  3. Add after-hyperpolarization.

  4. Add GABA_B modulation of synaptic transmission to the interneuron populations. Initially drive that modulation so that transmission is only successful during retrieval modes. If that does not achieve the desired effect, drive it instead so that successful transmission occurs only during the performance stage of the task. (I'm actually trying the second option first, simply as a system test, since it is assured to interefere less with learning.)

  5. Check if the task is still properly learned and if performance is successful.

    Using the transmission modulation driven according to the stage of the task, performance is moderately successful. Unfortunately, performance degrades as more trials are done. This is caused by the effect of the inhibition on the encoding phases that go on during the performance stage of the task. Thus, it is shown that there are the following ways to achieve the desired responses:

    (1) Suppress transmission of inhibition well during the encoding mode of each theta cycle.
    (2) Separately control STDP and transmission during the learning and performance stages of the task. Both STDP and transmission must then be enabled during learning, while the performance stage involves lateral inhibition in populations r and s, as well as suppression of plasticity.
    (3) Instead of suppressing the firing of other neurons in r and s populations, suppress the transmission along the paths from current state during the retrieval mode of each theta cycle.
    (4) Instead of suppressing the firing of other neurons in r and s populations, suppress the transmission along the paths from current state during the performance stage of the task. In this case, firing can still occur as needed during the encoding modes of each theta cycle in the performance stage of the task.

    The drawbacks of approaches (3) and (4) are the reliance on transmission modulation. While other approaches also involve such modulation in the current implementation, the effect of that modulation can often also be achieved by adding modulation of membrane potential to target neurons, which affects their tendency to fire. That may also be possible for (3) and (4), but may involve further changes in the strength of transmission used during encoding modes so that the ability to fire during encoding is unaffected. The advantage of approaches (3) and (4) is clearly that the modulation directly targets the functional need: A desire to restrict propagation. Other features, such as neuronal spiking in other minicolumns that is caused by other input are left unchanged.
    Approach (2) has the drawback that it relies on the modulation of plasticity, a phenomenon that we used in models of hippocampal function, but one that we have not yet addressed in terms of physiological evidence in PFC.
    I would prefer to try (1) now.

    But there is one more issue: Since there is no LTD in the current model, and new connections should not appear in Wf when r and s neurons in only the current state minicolumn can fire during task performance, how does the degradation of performance actually happen? Answer: The output population produces spikes when a combination of current state input and output from the x population is received. This does not depend upon Wf (which is correct). Therefore the difference must lie in the x population output. That was confirmed by inspection of x spiking with and without the inhibition of r and s population activity. The only time when r or s population activity is relevant to the x population is during encoding. This means that proper encoding of associations, specifically those associations involved in Surm trials (Go action leads to Srm or Srnm trials), was not completed at the end of the learning stage of the task. Apparently, previous simulation runs completed that learning during the performance stage so that the issue did not appear in the results. The possible solutions: (a) Increase the duration of the learning stage of the simulation. (b) Allow activity to propagate unhindered during the encoding mode of each rhythmic cycle. I choose (b), since simulations already take quite long and since changing the task protocol would involve recalculating a large number of spike event times for the event sequencers. This choice works well with alternative (1) above.

    These changes are all implemented in minicolumns-schultz-task.20040712.ccm.

  6. Check to see if the neuron activity is now more easily recognizable as that seen in the Schultz et al. results.

  7. Use the standard method to extract figure data and produce the comparison figure.

  8. Consider adding noise.
    I have made an attempt to add noise to the entire system by replacing the perfectly regular 8Hz rhythm provided by a signal generator with a circuit that produces a rhythm that has some added noise. This is implemented in minicolumns-schultz-task.20040715.ccm.

To create improved figures of a-STM-buffer repetition and replacement, I made the model implementation minicolumns-schultz-task.20040715b.ccm, which produces responses in the vector recorder that are processed with the Yorick script showstacked.i.

Now, I am attempting to send the desire for reward directly to the y population instead of through the a population so that a(rew) data will look cleaner. This is implemented in minicolumns-schultz-task.20040717.ccm. Some further care needs to be taken with regard to the output to x and y populations from a, since the modification initially results in a loss of performance.

As it turned out, I had made the wrong modification by removing output from the a population to the y population. Instead, I had to remove the "desire for reward" input to the a population and send that directly to the y_diffuse population with the UseWa connection router table. That led to correct performance and an improved match between the responses of our a population and observations made by Schultz et al. As a result of the change in the model, the output for task performance has also become somewhat clearner. New data is stored in minicolumns-schultz-task.20040717.a-population.cdf.gz and minicolumns-schultz-task.20040717.task-performance-results.png. That was used to update figures as described in TL#200407190918.

Despite the fact that inhibition is used to suppress propagation through through r and s populations and to achieve one-step activity in s, there appears to be regular activity in some r and s neurons. The following analysis attempts to determine the cause.

              retr.  retr.  enc.   enc.   retr.  enc.
s7(Go->Rew) : 13146, 13264, 13364, 13478, 13526, 13600
r56(Rew<-Go): 13148, 13266, 13366, 13480, 13528, 13602
a5(Go)      : 13128, 13228, 13335, 13451, -----, 13573
a4(Rew)     : -----, 13255, 13356, 13469, -----, 13590
state(Go)   : there is none, since Go is an action 
r4(Go<-Srm) : 13142, 13243, 13352, 13466, 13540, -----
r5(Go<-Srnm): 13142, 13243, 13349, 13466, -----, -----
r6(Go<-Surm): 13142, 13243, 13352, 13466, -----, -----
x7(Go<-Rew) : 13143, 13261, 13360, 13475, 13522, 13596
                            FINE   FINE          FINE

In minicolumns-schultz-task.20040721.ccm.gz, I attempt to improve the flexibility of the minicolumn PFC model by taking the minicolumns out of their capture box. Further clean-up is done in minicolumns-schultz-task.20040726.ccm.gz, in which the vector recorder output of population a is returned to its normal order, equal to that of the minicolumns:
00-7Go
18-15NoGo
216-23-
324-31-
432-39Srm
540-47Srnm
648-55Surm
756-63Rew
(The first number indicates the minicolumn ID number, the range of numbers after that indicates the corresponding group of neurons in populations of 64 neurons.) In minicolumns-schultz-task.20040727.ccm.gz, some unused stimulus control circuitry is removed to clarify the "Stimulus-Presenter" and "Presentation-End" event sequencer control of stimuli in the task. This follows a detailed description of event sequencer function in the experiment protocol that will be used to create a computer aided method of rapidly modifying the protocol in a sensible manner.

A method that aids in the modification of the experiment protocol is now implemented in the set of scripts protocol.sh (this script calls the other two), protocol.m and the C++ utility protocol.cc. As an example, a few minor modifications of the protocol were made in minicolumns-schultz-task.20040728.ccm.gz. And in the example minicolumns-schultz-task.20040728b.ccm.gz, the duration of the learning stage is doubled by duplicating the set of learning trials and moving the performance stage accordingly (protocoldata20040728.m). Note that the use of the protocol utility bumped into a limitation of Catacomb with regards to the number of events in an event sequences.

Future Directions

The delay from r(n-1) to y_specific may be removed by making the transmission from r(n-1) to y_specific during encoding subthreshold. Activity in y_specific then depends on additional (initially subthreshold) activity from x. That guarantees the order needed for STDP. This is especially sensible if all STDP includes functional LTD so that initial subthreshold connections between x and y (and between s and r) become 0 if there is no pre-post activity. (This paragraph also appears in DIL#20040716114855.1.)


~/doc/html/minicolumns-schultz-task.html - Fri Sep 10 05:16:42 EDT 2004 - Randal A. Koene