Fitness

The inclusion of intrinsic suppression in deep neural networks captures the dynamics of adaptation in neurophysiology and notion – science advances

abstract

Adaptation is a fundamental property of sensory systems that can change subjective experiences in the context of current information. It has been postulated that the adaptation results from recurring switching mechanisms or as a result of neuronal intrinsic suppression. However, it is unclear whether intrinsic suppression alone can explain effects beyond reduced responses. Here we test the hypothesis that complex adaptation phenomena can arise from the intrinsic suppression that cascades through a feedforward model of visual processing. A deep convolutional neural network with intrinsic suppression captured neural signatures of adaptation, including novelty detection, enhancement, and shift curve shifts, while creating aftereffects consistent with human perception. When adaptation was trained in a task where repeated input affects recognition performance, an intrinsic mechanism generalized better than a repetitive neural network. Our results show that the feedforward propagation of intrinsic suppression alters the functional state of the network and reproduces important neurophysiological and perceptual properties of adaptation.

INTRODUCTION

The way in which we process and perceive the environment around us is not static, but is continuously modulated by the incoming sensory information itself. This property of sensory systems is known as adaptation and can significantly change our perceptual experience, for example the illusory perception of upward movements after prolonged observation of a waterfall (1). In the brain, neural responses adapt in remarkably similar ways to recent stimulus history across sensory modalities and types, suggesting that neural adaptation is controlled by basic and conserved underlying mechanisms (2). The effects of adaptation on both the neural and perceptual levels have been studied most extensively in the visual system, where they appear to play a central role in integrating the temporal context (3–5). In order to understand vision under natural, dynamic conditions, we must therefore consider the neural processes that contribute to visual adaptation and how these processes create functional states in neural networks. However, we do not have a comprehensive understanding of the underlying adaptation mechanisms and how they lead to changes in perception.

A fundamental question is whether the dynamics of adaptation are implemented through recurring interactions in the neural network (6) or whether they can result from established intrinsic biophysical mechanisms that act in each individual neuron (2). An important argument in favor of the role of intrinsic cellular mechanisms in adaptation is that contrast adaptation in the cat's visual cortex leads to a strong post-hyperpolarization of the membrane potential (7). In other words, the more a neuron fires, the more its excitability is reduced, which is why the phenomenon is sometimes referred to as neuronal fatigue (8). In this scenario, the adaptation is caused by the intrinsic properties of individual neurons, which decrease their responsiveness in proportion to their previous activation. Throughout this work we use the term intrinsic suppression to refer to those neuronal intrinsic mechanisms that suppress responses based on recent activation.

However, adaptation phenomena in the brain go far beyond the rate of fire-based suppression, and it is not always clear whether these phenomena can be explained by intrinsic neural properties. First, the level of suppression depends not only on the previous rate of fire, but can be stimulus-specific; i.e. the suppression depends on whether the current stimulus is a repetition or not (9). Second, the adjustment can also lead to an improvement in the response of individual neurons (5, 8, 10, 11), sometimes even at the population level (12). Finally, the adaptation can cause a shift in the tuning function of the neuron for a certain stimulus dimension such as orientation (13, 14), direction (15) or spatial and temporal frequency (16, 17). Attunement shifts include both suppression and enhancement of response (13) and have been associated with perceptual aftereffects, in which adjustment causes a shift in perception of a stimulus property (15). Complex adaptation phenomena such as voting shifts have fueled the argument that recurring network mechanisms should be involved (13, 15, 16, 18). The putative inclusion of recurring signals is supported by computational models that implemented adaptation by changing recurring interactions between orientation-matched channels to successfully generate peak displacements (18-20).

Adaptation effects cascade through the visual system and can alter network interactions in unexpected ways (2, 21). For example, adaptation-induced shifts in the spatial coordination of neurons of the primary visual cortex (V1) can be explained by a two-layer model in which changes in the response gain in the lateral genicular nucleus cascade to V1 are explained by a fixed weighting (22). These results highlight the need for deeper, layered models to capture the effects of adaptation, as previous models, lacking the distinctive hierarchical depth and complexity of the visual cortex, may not be sufficient to demonstrate the feedforward potential of intrinsic neural mechanisms. In addition, the units in earlier models are only designed to accommodate a specific stimulus dimension, such as B. the orientation to code, and therefore cannot provide a comprehensive framework for visual adaptation. In contrast, deep convolutional neural networks have recently emerged as a powerful new tool for modeling biological vision (23-26) (but see discussion in (27)). When these models are trained to classify natural images, they describe the phases of ventral visual flow processing of short stimulus presentations with unprecedented accuracy (28–33), capturing essential aspects of object recognition behavior and perceived shape resemblance (29, 31, 34). In this study, we take advantage of another advantage of deep neural networks; H. Your ability to demonstrate how complex properties can emerge from the introduction of biophysically inspired neural mechanisms. We implemented activation-based intrinsic suppression in a forward convolution network (35) and tested the hypothesis that complex adaptation phenomena can easily occur without dedicated recurring mechanisms.

A comprehensive model of visual adaptation should not only capture the neurophysiological dynamics of adaptation, but also evoke the perceptual consequences. Therefore, we evaluated the proposed computational model for implementing intrinsic suppression with critical neurophysiological and psychophysical experiments. We first show that the model captures the basic neurophysiological characteristics of repetitive suppression, including stimulus-specific suppression, not only from one image to the next, but also across multiple image presentations (5). Second, we show that the model easily creates the two basic perceptual aftereffects of adaptation, namely a perceptual bias in the estimation of a stimulus parameter and an improved distinguishability between parameter levels (3). In contrast to previous models, which were limited to a low-level property such as orientation, we demonstrate these effects using facial sex (36) as a stimulus parameter to highlight the general applicability of the model. Third, we show that perceptual aftereffects coincided with improvement in responses and the coordination of peak shifts, phenomena that often require the involvement of recurring network mechanisms (13, 15, 16, 18). Changes in response size mainly contributed to the perceptual bias, but voting changes were required to explain improved distinctness. Finally, we show that a trained intrinsic neural mechanism is less likely to over-adapt and therefore offers a less complex solution than a recurring network mechanism. Overall, these results do not rule out a role for recurring processes in the brain, but show that the typical neural and perceptual effects of adaptation can be explained by activation-based suppression that cascades through a complex feedforward sensor system.

RESULTS

We investigate whether complex adaptation phenomena easily arise from the propagation of activation-based intrinsic suppression in a feedforward neural network model of ventral current processing. We used a pre-trained neural convolutional network (Fig. 1A) (35) as a bottom-up computational model for vision and introduced an exponentially decreasing intrinsic adaptation state into each unit of each layer of the network, the parameters of which were set so that they were suppressed imposed (Fig. 1B; Materials and Methods). The two neural adaptation parameters α and β (equations 1 and 2) were not trained to correspond to the neural responses or behavioral results, but were the same for each unit and were selected to lead to a gradual build-up and recovery of the adapted state over several perform time steps (Fig. 1B). Throughout the article we use α = 0.96 and β = 0.7, unless otherwise stated. Due to the intrinsic suppression mechanism, the model units show responses that evolve over time (Fig. 1C), and their activations can be compared directly with the neurophysiological dynamics.

Fig. 1 Neural network architecture and integration of activation-based intrinsic suppression.

(ONE) Architecture of a static deep convolution network, in this case AlexNet (35). AlexNet contains five convolution layers (conv1 through conv5) and three fully connected layers (fc6, fc7 and the decoder fc8). The unit activations in each layer and thus the output of the network are a fixed function of the input image. Photo credit: Kasper Vinken, Boston Children's Hospital, Harvard Medical School. ((B.) The intrinsic suppression was implemented for each unit using an intrinsic adaptation state s (t) (orange), which modulates the response r (t) (blue) and is updated at each time step based on the previous response r (t – 1) . (Equations 1 and 2). The parameter values ​​α = 0.96 and β = 0.7 were chosen to effect a response suppression (β> 0) that gradually builds up over time: With constant input (gray shaded areas) the value of the eigenstate s increases (t) gradually decreases, which leads to a decrease in the response r (t). The intrinsic matched state recovers with no input (unshaded areas). ((C.) Temporal expansion of the network in (A), the activation of each unit being a function of its inputs and its activation in the previous time step (equations 1 and 2).

A neural network with intrinsic suppression records the temporal dynamics of adaptation on a neurophysiological level

We'll start with the most important feature of neural adaptation: repetition suppression, which refers to a decrease in neural responses when a stimulus is repeated. We illustrate this phenomenon with an experiment in which a macaque monkey was presented with facial stimuli in pairs: an adapter followed by a test stimulus (Fig. 2A) (37). In repeated attempts, the test stimulus was identical to the adapter, while in alternation attempts the adapter and the test stimulus were different. Neurons recorded in the median lateral visual field of the lower temporal (IT) cortex showed a decrease in response during stimulus presentation and after stimulus offset. In addition, the neurons showed a lesser response to a facial stimulus if it was a repeat attempt (blue) than if it were an alternate attempt (orange; FIG. 2B).

Fig. 2 Activation-based intrinsic suppression in a neural network captures the attenuation of neurophysiological responses during the suppression of repetitions.

((ONE) Facial stimuli (created with FaceGen: facegen.com) were presented in repetition tests (adapter = test) and alternation tests (adapter ≠ test). ((B.) Responses in the IT cortex (n = 97, normalized to average peak activity) are more strongly suppressed for a repeated stimulus (blue) than for a new stimulus (orange, data from (37)). Black bars indicate the stimulus presentation. (C.) The same experiment as in (A) and (B) resulted in a similar repetition suppression in the model with intrinsic suppression (black, blue and orange lines; gray: no adjustment mechanism; average activity after ReLU of all N = 43,264 conv5 units). The units of the x-axis are time steps that are mapped in (B) to bins of 50 ms. ((D.) Example of an oddball sequence (above) with a standard with high probability (blue) and a deviation with low probability (purple) and an example sequence with equal probability (below) as a control (green, texture images from vismod.media.mit .edu / pub /) VisTex /). ((E. and F.) Average neuronal responses in rat V1 (n = 55, (E)) and LI (n = 48, (F)) (12) for the standard conditions (blue), deviating (purple) and control conditions (green) (normalized) by the answer to attempt one). (G) The response differences between deviation – standard (blue) and deviation – control (green) increase from V1 to LI (error bar: 95% bootstrap confidence interval (CI), assuming that there is no difference between the animals). ((H. to J.) By carrying out the experiment in the model, the reaction dynamics are recorded similar to the visual cortex of the rat. (H) and (I) show the results for conv1 and fc7 (indicated by larger markers in (J)). Green and blue horizontal lines and shades in (J) indicate the average values ​​of the neural data of (G).

We evaluated the average time courses of activation of the model units for the same experiment (Fig. 2C; mean value of all N = 43,264 units in layer conv5). The model units showed a decrease in response over the course of the stimulus presentation. In accordance with the suppression of repetitions in biological neurons, the response of the model units to the test stimulus was less when repetitions were performed than when attempting to switch. For this set of stimuli, the greatest difference between repetition and alternation attempts was observed for the conv5 layer (see other layers in Fig. S1A).

The model units demonstrated several key features of fitting on two time scales: (i) During the presentation of a stimulus, including the first stimulus, the response decreased over time; (ii) the overall response to the second stimulus was less than the overall response to the first stimulus; and (iii) the response to the second stimulus was more attenuated if it was a repetition. However, the model did not capture more complex dynamics such as the second peak in neural responses. The model responses showed a smaller difference between repetitions and switching than biological neurons: the mean difference between switching and repetition was 0.07, SD = 0.12 (model, five test time steps) and 0.11, SD = 0.15 (IT- Neurons, 850 to 1000 ms) window) in the normalized scale of Fig. 2 (B and C).

We hypothesized that the computer-generated faces were too similar for the model to show the full range of adaptation effects. So we did the same experiment with natural images with greater variability. Natural stimuli led to a significantly greater difference between repetition and alternation attempts (Fig. S1B), which indicates that the selectivity of the adaptation at least partially reflects the stimulus similarity in the model representations. In accordance with this idea, the stimulus similarity of the activation patterns prior to adaptation for different adapter and test images correlated positively with the extent of suppression for most layers (Fig. S2).

An important property of the suppression of repetitions in macaque IT is the stimulus specificity: Even with two adapters that activate the same neuron equally, the suppression for a picture repetition is still stronger than for a change (9). It is not easy to see how a neural intrinsic mechanism could explain this phenomenon, since a mechanism based on the intrinsic rate of fire is not itself stimulus-selective (5). Fig. S3 shows that as activation-based suppression spreads through the layers of the network, the neural adaptation of individual units increasingly depends less on their previous activation, until there is stimulus-specific suppression for most of the individual units in fully connected layers.

In addition to the two time scales shown in FIG. 2 (A to C), the adaptation also works on longer time scales. For example, repetition suppression typically accumulates over several attempts and can survive intervening stimuli (9). To illustrate this longer timescale, we present multi-unit data from the rat visual cortex (12) recorded during an oddball paradigm in which two stimuli, e.g., A and B, are presented in a random sequence with different probabilities were (Fig. 2D): A standard stimulus was shown with a high probability (P = 0.9; blue), and a deviating stimulus was shown with a low probability (P = 0.1; purple). Stimulus (A or B) and condition (standard or deviant) were balanced for each neural record. It was far more likely that the standard stimulus would repeat itself in the sequence, which allowed an adaptation to build up and therefore the response for later attempts in the sequence decreased (Fig. 2, E and F, blue). The fit was evident in both V1 and the extrastriated latero-intermediate visual cortex (LI).

We evaluated the model in the oddball paradigm without tuning or parameter changes. The model recorded qualitatively the difference in response between standard and deviating stimuli (Fig. 2, H and I). In comparison of FIG. 2E with F, the effect of the adaptation in LI was stronger than in V1 (FIG. 2G). An increase in customization along the visual hierarchy is consistent with the idea of ​​customization cascading through the visual system with additional contributions in several stages. Like the neural data, the model showed increasing adaptation effects from one layer to the next (Fig. 2J), and this increase only occurred when intrinsic suppression was built into multiple layers (Fig. S7).

In the original experiment, images A and B were also presented in separate equally probable control sequences, with each stimulus being presented with an equally low probability (P = 0.1) along with eight additional stimuli (Fig. 2D) (12). Equally probable sequences are typically used to distinguish repetitions from surprise effects, since the probability of a repetition in the control condition is the same as for the deviating ones, but no picture is more likely or less likely than the others. So if neural reactions also signal the unexpectedness of the deviating, the reaction to a deviating stimulus should be greater than the control condition observed for the recording of locations in the downstream visual area LI (Fig. 2F; purple> green). The model also showed a difference in response between aberrant and equally probable control conditions in higher layers (Figures 2, I and J). Since the model only contains the feedforward dynamics of intrinsic suppression, this difference in response cannot be attributed to an explicit coding of the expectation. Instead, the lower response for the control condition results from a higher cross-stimulus fit from the additional stimuli in the equally probable sequences. This observation means that intrinsic suppression in a forward neural network not only captures differences in response due to the repetition frequency of a stimulus itself (different from the standard), but also differences in the likelihood of occurrence of other stimuli (different, surrounded by standard with high probability versus the standard) surrounded by several equally probable stimuli).

A neural network with intrinsic suppression creates perceptual aftereffects

A comprehensive model of visual adjustment should not only capture the neural properties of repetition suppression, but also explain the perceptual aftereffects of adjustment. After-effects occur when recent exposure to an adapter stimulus distorts or otherwise alters the perception of a subsequently presented test stimulus. For example, previous exposure to a male face makes another face appear more feminine to a viewer, and vice versa (FIG. 3A). In other words, the adaptation biases the decision limit for the perceptible discrimination of face and gender against the adapter. A defining property of this type of aftereffect is that no perceptual distortion should occur if the adapter corresponds to the original limit stimulus (e.g. a gender-neutral face). Here we focus on the face-gender dimension, but similar results for the inclination post-effect (38) with grids are shown in Fig. 1. S4.

Fig. 3 A neural network that contains intrinsic suppression creates the perceptual distortion and improved discernibility of aftereffects.

((ONE) Examples of the facial gender morph stimuli (created with webmorph.org) that were used in our simulated experiments. After exposure to a male adapter face, the boundary of the gender decision shifts in the direction of the adapter, and an observer perceives a subsequent test face as female and vice versa (36). The exemplary adaptation, testing, and perception morph levels were selected based on the estimated limit displacement shown in (B). ((B.Decision limits before (blue) versus after (orange) exposure to a male (0%) adapter based on the top layer (fc7) of the model with intrinsic suppression. Markers indicate class probabilities for each test stimulus, solid lines indicate the corresponding psychometric functions, and vertical lines indicate the classification limits. The adjustment to a 0% (male) face leads to a shift in the decision limit towards male faces, whereby the 20% test stimulus is perceived as gender-neutral (50%). ((C.) Decision limit shifts for the test stimulus as a function of the adapter morph level per slice. The round marking shows the limit shift shown in (B). ((D.) Relative discriminability between face and gender (materials and methods, values> 1 mean increased discriminability and values ​​<1 mean reduced discriminability) for fc7 as a function of the adapter and test morph level. See color scale on the right. The red diagonal indicates that the discriminability between face and gender is increased for morph levels in the vicinity of the adapter. (E.) Average changes in discriminability between face and gender per shift as a function of the absolute difference in morph level between face and gender between adapter and test stimulus.

To assess whether the model can describe perceptual aftereffects, we created a series of facial stimuli that changed from an average man to an average woman and measured the category decision threshold for each layer of the model before and after fitting (materials and methods) . We looked again at the same model from the previous section without any parameter changes. When the model is exposed to an adapter surface, the decision boundary is shifted towards the adapter. Before fitting, the predicted female probabilities for the model fc7 layer showed a typical sigmoid curve centered around the gender-neutral facial stimulus with a morph level of 50% (Fig. 3B, blue). After adapting to a male face with a morph level of 0%, the decision limit shifted by 30 percent values ​​in the direction of the gender of the adapter (FIG. 3B, orange). 3C shows that for all layers the adaptation to a facial stimulus led to a limit shift in the direction of the adapter. In agreement with the perceptual after-effects in humans, the adaptation to the original gender-neutral threshold stimulus with a morph level of 50% had no influence on the decision threshold (Fig. 3C). The perception distortion did not occur suddenly in later layers, but rather slowly built up with increasing layers (FIG. 3C, from black to purple to yellow) and already occurred within the first layer with intrinsic suppression (FIG. S8A).

Although the adaptation to the limit stimulus did not shift the decision limit, it increased the slope of the psychometric function for fc7 from 0.077 to 0.099 (29%; for layers conv1 to fc6 the slope changes were -3, 11, 9, 12, 16 and 31, respectively %). An increase in the slope means a rejection of more female and more male stimuli away from the adapter. This result contradicts the perception renormalization hypothesis, which predicts that the adaptation shifts the norm of the representation space evenly in the direction of the adapter and therefore the adaptation to the original norm (i.e. the limit stimulus) should not have any effect (see Figure 3 of ( 39)). A number of previous experiments have shown that both inclination and facial aftereffects involve rejection rather than renormalization (40), which is consistent with the mathematical model proposed here.

In addition to distorting the perception of a stimulus property, it is assumed that the adaptation also increases the sensitivity of the system to small differences from the currently prevailing input properties, which could serve to maintain good stimulus discriminability (3, 4). Consistent with this hypothesis, Yang et al. (41) found that fitting to a female / male face improved gender discrimination by the morph level of the adapter between face and gender. We investigated whether the intrinsic suppression in the model could be responsible for such improved discrimination (materials and methods). The fit in the model actually improved the discriminability between face and gender for morph planes near the adapter (red diagonal in 3D), while the discriminability for morph planes different from the adapter (blue) was reduced. Like the perceptual distortion (Fig. 3C) and in accordance with the results shown in Fig. 2 (G and J), the discriminability effects built monotonically over successive layers (Fig. 3E; see Figs. S4, D and E, z similar results with oriented grids). In contrast to boundary shifts, improvements in discrimination first occurred downstream of the first layer with intrinsic suppression (FIG. S8B). Overall, the proposed model shows that activation-based suppression can explain discrimination improvements in the vicinity of the adapter without other specialized mechanisms and without introducing model changes.

Shifts in the response improvement and the tuning curve result from the intrinsic suppression that spreads into deeper layers

Um die Mechanismen, die Wahrnehmungsnachwirkungen zugrunde liegen, besser zu verstehen, haben wir im Gesichts-Geschlecht-Experiment untersucht, wie sich die Anpassung auf die Reaktionen einzelner Einheiten auswirkt (siehe Abb. S5 für Analysen der Neigungsnachwirkungen). Abbildung 4A zeigt die Voranpassungsaktivierung jeder ansprechenden fc7-Einheit über die weibliche / männliche Dimension (Spalte 1) und wie sich die Aktivierungsstärke jeder Einheit in Abhängigkeit vom Adapter geändert hat (Spalten 2 bis 6). Die Zeilen in jeder Heatmap sind nach dem Geschlechtsselektivitätsindex (SIg; Materialien und Methoden) sortiert, der von einer stärkeren Reaktion auf männliche Gesichter (SIg) reicht < 0, units shown at the top) to more responsive to female faces (SIg > 0, Einheiten unten gezeigt). Nach der Anpassung zeigten die meisten Einheiten eine insgesamt unterdrückte Reaktion (blau), unabhängig vom Geschlecht des Adapters. Einheiten mit einer starken Präferenz für männliche Gesichter (obere Reihen) zeigten jedoch eine verbesserte Reaktion (rot) nach Neutralität gegenüber weiblichen Adaptern (Spalten 3 bis 5), während Einheiten mit einer starken Präferenz für weibliche Gesichter (untere Reihen) den gegenteiligen Effekt zeigten (Spalten 1 bis 3). Daher zeigten hochselektive Einheiten eine Verbesserung der Reaktion, nachdem sie sich an das Gegenteil ihres bevorzugten Geschlechts angepasst hatten. Diese Antwortverbesserung kann durch die Enthemmung (8) erklärt werden, bei der die Anpassung den Hemmungseingang für Einheiten verringert, die Morph-Niveaus bevorzugen, die weiter vom Adapter entfernt sind, ähnlich wie Antwortverbesserungen von mittleren temporalen Zellen für ihre bevorzugte Richtung nach Anpassung an die entgegengesetzte Richtung ( 42).

Abb. 4 Antwortverbesserungen und Abstimmungsverschiebungen treten in tieferen Schichten eines Netzwerks auf, das eine intrinsische Unterdrückung enthält.

(ONE) Auswirkungen der Anpassung an weibliche / männliche Gesichter auf die Aktivierungsstärke einzelner Einheiten. Links: Heatmap mit der Aktivierung, die auf das Maximum aller 556 ansprechenden fc7-Einheiten (Zeilen) für alle Morph-Bilder (Spalten) des Gesichtsgeschlechts normalisiert ist. Siehe die Farbskala unten links. Zeilen werden nach dem SIg sortiert (Gl. 3). Die verbleibenden fünf Heatmaps zeigen den Unterschied (nach der Voranpassung) bei der Aktivierung einzelner Einheiten nach Anpassung an fünf verschiedene Adapter. Siehe die Farbskala unten rechts. ((B.) Mittlere Antwortänderung (Aktivität nach Aktivität vor) über die Antworteinheiten für jede Schicht (schattierter Bereich = 95% Bootstrap-CI). Für stark geschlechtsselektive Einheiten (rot) wurde die Größenänderung (gemittelt über die Stimuli) nach Anpassung an einen geschlechtsspezifischen Stimulus entgegengesetzt zum bevorzugten Geschlecht der Einheit (0% Adapter für SIg> 0,6, 100% Adapter für SIg <–0,6; schwarze Rechtecke in (A)). Für weniger geschlechtsspezifische Einheiten (blau) wurde die Größenänderung nach 0- und 100% -Adaptern verwendet. (C.) Anteil der Adapter, der bewirkt, dass sich das bevorzugte Morph-Niveau vom Adapter in Richtung (attraktiv, magenta) oder weg (abstoßend, grün) verschiebt, gemittelt über die Einheiten (schattierter Bereich = 95% Bootstrap CI). ((D.) Eine beispielhafte Einheit, die eine abstoßende Verschiebung der Abstimmkurven für die Adapter 25% (links) und 75% (rechts) zeigt (die y-Achsen zeigen die Aktivierung in beliebigen Einheiten (au); schwarz, Abstimmkurve vor der Anpassung; grün, Abstimmkurve nach der Anpassung; gelbe Markierung, Adapter-Morph-Level). ((E.) Eine Beispieleinheit, die eine attraktive Verschiebung der Abstimmkurven zeigt (Magenta, Abstimmkurve nach Anpassung; gleiche Konventionen wie (D)).

Um diese Antwortverbesserung für alle Schichten zu quantifizieren und zu vergleichen, haben wir stark geschlechtsspezifische Einheiten (| SIg |> 0,6) berücksichtigt und ihre Antwortverbesserung (gemittelt über alle Stimuli) berechnet, nachdem wir uns an das Gegenteil ihres bevorzugten Geschlechts angepasst hatten. 4B zeigt, dass die Antwortverbesserung für hochselektive Einheiten (rot) in tieferen Schichten (immer stromabwärts der ersten Schicht mit intrinsischer Unterdrückung; Abb. S9A) auftrat, während weniger selektive Einheiten meist eine Antwortunterdrückung (blau) in allen Schichten zeigten.

Anpassung kann zu Änderungen der Reaktionsstärke führen, aber auch zu einer Verschiebung des Peaks der Abstimmungskurve eines Neurons. Beispielsweise kann in orientierungsselektiven Neuronen die Anpassung an ein orientiertes Gitter eine Verschiebung der Spitze der Abstimmkurve entweder zum Adapter (attraktive Verschiebung (13, 14, 43)) oder vom Adapter weg (abstoßende Verschiebung (13, 18) bewirken )). Die Anpassung im Modell führte zu beiden Arten von Spitzenverschiebungen in den Abstimmkurven (Abb. 4, D und E). Für jede Einheit haben wir den Anteil der Adapter berechnet, die eine attraktive Verschiebung oder eine abstoßende Verschiebung erzeugt haben (Abb. 4C). Anpassungsinduzierte Peakverschiebungen traten in tieferen Schichten des Netzwerks stromabwärts der ersten Schicht mit intrinsischer Unterdrückung auf (Abb. S9B). Insgesamt waren attraktive Verschiebungen häufiger, die in den letzten Schichten zu einem Anteil von 0,5 gipfelten.

Es wird angenommen, dass Abstimmungsänderungen notwendig sind, um Wahrnehmungsnachwirkungen hervorzurufen. Beispielsweise wurde argumentiert, dass eine abstoßende Wahrnehmungsverzerrung, bei der sich die Entscheidungsgrenze in Richtung des Adapters verschiebt, Abstimmungskurven erfordert, die sich in Richtung des Adapters verschieben (15, 19). Die Tatsache, dass die intrinsische Unterdrückung im Modell meist attraktive Verschiebungen erzeugt (Abb. 4C) und gleichzeitig Grenzverschiebungen erfasst (Abb. 3C), scheint mit dieser Idee übereinzustimmen. To disentangle the separate contributions of tuning changes and response magnitude changes to the perceptual adaptation effects produced by the model, we manipulated the postadaptation layer activations to only contain either tuning changes or magnitude changes (Materials and Methods; Fig. 5). Changes restricted to response magnitude without tuning changes led to even larger boundary shifts than the original model, whereas changes restricted to tuning without any changes in response magnitude led to smaller boundary shifts (Fig. 5A). This observation suggests that while the perceptual bias of aftereffects might be the result of a complex interaction between changes in responsivity and tuning, the perceptual bias does not necessarily require attractive shifts as suggested by previous models (15, 19). On the other hand, an increased face-gender discriminability for morph levels close to the adapter did require changes in the tuning response patterns of single units. Magnitude changes only produced the opposite effect, with increased discriminability for morph levels furthest from the adapter (Fig. 5B).

Fig. 5 Response magnitude and tuning changes in the model differentially explain perceptual boundary shifts and discriminability changes.

(A) Face-gender boundary shifts toward the adapter were produced both by magnitude changes without tuning changes (top) and by tuning changes without magnitude changes (bottom). Gray shading indicates the range of original layer effects shown in Fig. 3C. (B) Face-gender discriminability enhancement for morph levels close to the adapter was produced by tuning changes without magnitude changes (bottom), but not by magnitude changes without tuning changes (top). Gray shading indicates the range of original layer effects shown in Fig. 3E.

Intrinsic adaptation can be optimized by maximizing recognition performance

Thus far, we have considered a model with an intrinsic adaptation state for each unit, and the adaptation parameters α and β (Eqs. 1 and 2) were chosen to impose response suppression. This leaves open the question of whether such adaptation mechanisms can be optimized or learned in a deep learning framework given a certain task goal. We considered two possible ways in which adaptation could be learned by artificial neural networks: (i) optimize α and β by training a feedforward network with intrinsic adaptation state on a task where adaptation is useful for biological vision; and (ii) train a recurrent network without an intrinsic adaptation state on the same task.

To assess whether adaptation could be learned and to compare the two possible network mechanisms, we needed a task objective with a suitable goal where adaptation could affect visual performance. As mentioned earlier, one of the proposed computational roles of neural adaptation is to increase sensitivity to small changes in the sensory environment (3, 4). A system could increase sensitivity by decreasing the salience of recently seen stimuli or features (5, 21). Thus, we developed a task where the end goal was object classification, but the objects were hidden in a temporally repeated noise pattern. If adaptation serves to reduce the salience of a recent stimulus, then adapting to a noise pattern should increase the ability to recognize a subsequently presented target object embedded in the same noise pattern, and a network trained on this task could learn to reduce the salience of previously presented input. To keep the networks relatively lightweight, we chose a classification task with low-resolution hand-drawn doodles rather than natural images (Fig. 6A).

Fig. 6 Adapting to prevailing but interfering input enhances object recognition performance.

(A) Representative examples for each of the five doodle categories from the total set of 540 selected images (63). (B) Schematic illustration of the conditions used in the doodle experiment. In each trial, participants or the model had to classify a hand-drawn doodle hidden in noise (test), after adapting to the same (middle), a different (right), or no (left) noise pattern. The trials with different or no noise adapters were control conditions where we expected to see no effect of adaptation. (C.) Participants showed an increase in categorization performance after adapting to the same noise pattern. Gray circles and lines denote individual participants (n = 15). The colored circles show average categorization performance; error bars indicate 95% bootstrap CIs. Chance = 20%.

Before training any network, we evaluated human recognition performance in this task. For this experiment, adaptation to the noise pattern at early levels of processing is likely sufficient to enhance the object information of the doodle. We ran a psychophysics experiment where participants were exposed to an adapter image and then classified a test image (Fig. 6B; Materials and Methods). Recognizing the doodles in this task is not trivial: whereas subjects can readily recognize the doodles in isolation, when they are embedded in noise and in the absence of any adapter, categorization performance was 59.7% (SD = 8.1%) where chance is 20%. As conjectured, adapting to the same noise pattern increased categorization performance by 9.3% (Fig. 6C; P = 0.0043, Wilcoxon signed-rank test, n = 15 subjects). This increase in categorization performance was contingent upon the noise pattern presented during the test stimulus being the same as the noise pattern in the adapter: Performance in the same-noise condition was 9.6% higher than in the different-noise condition (P = 0.0015, Wilcoxon signed-rank test, n = 15 subjects).

After establishing that adapting to the repeated noise pattern indeed improves the ability to recognize the target objects, we considered whether this behavior could be captured by the model. First, we considered the same model used in previous sections without any tuning. The same pattern of results was captured by the model with α and β fixed to impose activation-based suppression (fig. S10). Next, we asked whether it is feasible to fit intrinsic adaptation parameters α and β in the doodle experiment using recognition performance as the objective. We built a smaller network with an AlexNet-like architecture (Fig. 7A, without the recurrent connections shown in blue, which are discussed in the next paragraph; Materials and Methods). Each unit (excluding the decoder layer) had an exponentially decaying intrinsic adaptation state as defined by Eqs. 1 and 2. For simplicity, the trials were presented in three time steps: the adapter, a blank frame, and the test image (Fig. 7A). In addition to training the feedforward weights, we simultaneously optimized one α and one β parameter per layer. The value of α determines how fast the intrinsic adaptation state updates, ranging from no update (α = 1) to completely renewing at each time step (α = 0). The value of β determines whether the intrinsic adaptation state is used for suppression (β > 0), enhancement (β < 0), or nothing at all (β = 0).

Fig. 7 Intrinsic adaptation can be trained by maximizing recognition performance and is more robust to over-fitting than a recurrent neural network.

(A) A convolutional neural network with an AlexNet-like feedforward architecture. For the adaptation version, an exponentially decaying hidden state was added to each unit according to Eqs. 1 and 2 (except for the decoder). For the recurrent version, fully recurrent weights were added for the fully connected layer and convolutional recurrent kernels for the three convolutional layers (see drawings in blue; Materials and Methods). (B) Average fitted parameters α and β for each layer after training 30 random initializations of the network with intrinsic adaptation state on same noise trials (SEM bars are smaller than the markers). (C.) Test categorization performance on trials with the same Gaussian noise distribution as during training. Full markers: average categorization performance after training 30 random initializations on the same noise trials without intrinsic adaptation state (black), after training with intrinsic adaptation state on same noise trials (blue) or on different noise trials (orange). Empty markers: same as full markers but for the recurrent neural network. SEM bars are smaller than the markers. Chance = 20%, indicated by the horizontal dotted line. (D to F) Average generalization performance of the networks with an intrinsic adaptation state (magenta), recurrent weights (blue), or neither (gray) for same noise trials under noise conditions that differed from training. Performance is plotted as a function of increasing standard deviations (x axis) of Gaussian noise ((D), the vertical line indicates the SD = 0.32 used during training) and uniform noise (E) or as a function of increasing offset values added to Gaussian noise ((F), SD = 0.32, same as training). Error bounds indicate SEM.

After training using 30 random initializations on same-noise trials, the resulting parameters revealed response suppression that was particularly strong for convolutional layers 1 and 2, as indicated by the positive high β and low α values (Fig. 7B). The average categorization performance on the test set was 97.9% (blue), compared to 74.8% when no intrinsic adaptation state was included (black; Fig. 7C). Thus, when a network with intrinsic adaptation state was trained on an object recognition task with a temporally prevailing but irrelevant input pattern, the resulting adaptation parameters showed activation-based suppression.

A common way to model temporal dynamics in the visual system is by adding recurrent weights to a feedforward network (44–46). Recurrent neural networks can demonstrate phenomena similar to adaptation (47). Recurrent neural networks are the standard architectures used to process input sequences and should be able to perform well in the noisy doodle categorization task. To compare the intrinsic suppression mechanism with a recurrent circuit solution, we considered a network without intrinsic adaptation state and added lateral recurrent connections illustrated in blue in Fig. 7A (see the “Learning adaptation” section). After training on same-noise and different-noise trials, the recurrent architecture achieved the same categorization performance on the test set as the architecture with intrinsic adaptation (Fig. 7C). Thus, as expected, the recurrent network performed on par with the network with trained intrinsic adaptation.

Next, we asked whether there are any advantages of implementing adaptation via an intrinsic cellular mechanism versus lateral recurrent network mechanisms. We reasoned that a trained intrinsic suppression mechanism should generalize well across different input features or statistics, whereas the circuit-based solution learned by a recurrent neural network might be less robust. Therefore, we considered situations where the distribution of noise patterns used during training and testing was different. The recurrent network failed to generalize well to higher standard deviations of Gaussian noise (Fig. 7D) and failed markedly when tested with uniformly distributed noise (Fig. 7E) or Gaussian noise with an offset (Fig. 7F). In stark contrast, the intrinsic mechanism generalized well across all of these different input noise changes (Fig. 7, D to F, magenta). This over-fitting cannot just be explained by a difference in the number of parameters and also occurs when the number of parameters is equalized between the two networks (fig. S11). Furthermore, depending on the number of parameters, the recurrent network did not necessarily demonstrate the hallmark property of repetition suppression (fig. S12). In sum, while a recurrent network implementation can learn to solve the same task, the solution is less robust than an intrinsic mechanism to deviations from the particular statistics of the adapter noise used for training the network. These results suggest that intrinsic neuronal mechanisms could provide sensory systems in the brain with a well-regularized solution to reduce salience of recent input, which is computationally simple and readily generalizes to novel sensory conditions.

DISCUSSION

We examined whether the paradigmatic neurophysiological and perceptual signatures of adaptation can be explained by a biologically inspired, activation-based, intrinsic suppression mechanism (7) in a feedforward deep network. The proposed computational model bridges the fundamental levels at which adaptation phenomena have been described: from intrinsic cellular mechanisms, to responses of neurons within a network, to perception. By implementing activation-based suppression (Fig. 1), our model exhibited stimulus-specific repetition suppression (4, 5), which recovers over time but also builds up across repeats despite intervening stimuli (48) and increases over stages of processing (Fig. 2) (12, 49). Without any fine-tuning of parameters, the same model could explain classical perceptual aftereffects of adaptation (Fig. 3), such as the prototypical shift in perceptual bias toward the adapter (36, 38) and enhanced discriminability around the adapter (41, 50), thus suggesting that adaptation modulated the functional state of the network similarly to our visual system. In single units, perceptual aftereffects were associated with changes in overall responsivity (including response enhancements) as well as changes in neural tuning (Figs. 4 and 5). In addition, both intrinsic and recurrent circuit adaptation mechanisms can be trained in a task where reducing the salience of repeated but irrelevant input directly affects recognition performance (Fig. 6). However, the recurrent neural network converged on a circuit solution that was less robust to different noise conditions than the proposed model with intrinsic neuronal adaptation (Fig. 7). Together, these results show that a neuronally intrinsic suppression mechanism can robustly account for adaptation effects at the neurophysiological and perceptual levels.

The proposed computational model differs in fundamental ways from previous models of adaptation. Traditionally, adaptation has been modeled using multiple-channel models, where a fixed stimulus dimension such as orientation is encoded by a set of bell-shaped tuning functions (6, 19, 20). The core difference is that here we implemented adaptation in a deep, convolutional neural network model trained on object recognition (35). Even though current convolutional neural networks differ from biological vision in many ways (27), they constitute a reasonable first-order approximation for modeling ventral stream processing and provide an exciting opportunity for building general and comprehensive models of adaptation. First, in contrast with channel-based models, deep neural networks can operate on any arbitrary image, from simple gratings to complex natural images. Second, the features encoded by the deep neural network model units are not hand-crafted tuning functions restricted to one particular stimulus dimension but consist of a rich set of increasingly complex features optimized for object recognition, which map reasonably well onto the features encoded by neurons along the primate ventral stream (28–32). A set of bell-shaped tuning curves might be a reasonable approximation of the encoding of oriented gratings in V1, but this scheme might not be appropriate for other visual areas or more complex natural images. Third, the realization that adaptation should be considered in the context of deep networks, where the effects can propagate from one stage of processing to the next (2, 21), calls for complex multilayer models that can capture the cascading of adaptation. Last, whereas several models implement adaptation by adjusting recurrent weights between channels (19, 20), we implemented an intrinsic suppression property for each unit and allowed adaptation effects to emerge from the feedforward interactions of differentially adapted units.

The goal was not to fit the model on specific datasets but to generally capture the phenomenology of adaptation in a model by giving its artificial neurons a biophysically plausible mechanism. The adaptation parameters α and β were not fine-tuned for each simulated experiment and had the same value for each unit, showing that the ability of the model to produce adaptation phenomena did not hinge upon a carefully picked combination of parameters.

By using a feedforward deep neural network as the base for our computational model, we were able to empirically study the role of intrinsic suppression, without any contribution of recurrent interactions. These results should not be interpreted to imply that recurrent computations are irrelevant in adaptation. The results show that complex neural adaptation phenomena readily emerged in deeper layers, arguing that, in principle, they do not need to depend on recurrent mechanisms. Among the neural adaptation effects were enhanced responses of single units, as well as shifts in tuning curves, which are often thought to require recurrent network mechanisms (13, 15, 16, 18). Any effect of intrinsic suppression could also be implemented by lateral inhibitory connections in the circuit, leaving open the question of why the brain would prefer one solution over the other. The generalization tests in Fig. 7 point to an intriguing possibility, which is that intrinsic suppression provides a simpler solution that is more constrained, yet sufficient to implement the goals of adaptation. In contrast, recurrent mechanisms require a complex combination of weights to achieve the same goals and tended to over-fit to the specific training conditions.

There are several functional goals that have been attributed to adaptation. Activation-based suppression could serve to decrease salience of recently seen stimuli or features (5, 21). We successfully exploited this principle to train adaptation in neural networks on a task with temporally repeated but irrelevant noise patterns. Reducing the salience of recently seen features has functional consequences beyond these artificial conditions. By selectively reducing the sensitivity of the system based on previous exposure, adaptation effectively changes the subjective experience of an observer, leading, for example, to a perceptual bias in the face-gender aftereffect. These changes in perception may more broadly reflect mechanisms that serve to maintain perceptual constancy by compensating for variations in the environment (51). The introduction of activation-based, intrinsic suppression to an artificial neural network subjected the network to the same perceptual biases characterizing perceptual aftereffects in humans (Fig. 3, B and C), suggesting that intrinsic suppression changed the model’s functional state in a way that is similar to how exposure changes the functional state of our visual system.

Another proposed benefit of reducing sensitivity for recently seen stimuli may be to improve the detection of novel or less frequently occurring stimuli (12, 48). For example, by selectively decreasing responses for more frequent stimuli, adaptation can account for the encoding of object occurrence probability, described in macaque IT (52, 53). Consistent with these observations, intrinsic suppression in the proposed computational model decreased the response strength for a given stimulus proportional to its probability of occurrence (Fig. 2, H to J). The model also produced stronger responses to a deviant stimulus compared to an equiprobable control condition. Thus, response strength in the model captured not only differences in occurrence probability (standard versus deviant) but also relative differences in occurrence probability (control versus deviant): Compared to the control condition, the deviant is equally likely in terms of absolute occurrence probability, but it was unexpected merely by virtue of the higher occurrence probability of the standard stimulus.

Adaptation has also been suggested to increase coding efficiency of single neurons by normalizing their responses for the current sensory conditions (4). Neurons have a limited dynamic range with respect to the feature they encode and a limited number of response levels. Adaptation can maximize the information carried by a neuron by re-centering tuning around the prevailing conditions and thus increasing sensitivity and preventing response saturation (51). While AlexNet has ReLU activation functions, which do not suffer from the saturation problem, we did observe an abundance of attractive shifts of tuning curves (Fig. 4C). The collective result of these changes in tuning curves was an increased discriminability between stimuli similar to the adapter (Fig. 4D), consistent with reports for orientation, motion direction, and face-gender discrimination in humans (41, 50).

Besides direct functional benefits, adaptation may also serve an important role in optimizing the efficiency of the neural population code. Neurons use large amounts of energy to generate action potentials, which constrains neural representations (54). When a particular feature combination is common, the metabolic efficiency of the neural code can be improved by decorrelating responses of the activated cells and reducing their responsiveness. Adaptation has been shown to maintain existing response correlations and equality in time-averaged responses across the population (55), possibly resulting from intrinsic suppression at an earlier cortical stage, which we confirmed by running these experiments in the proposed computational model (fig. S13).

There are several possible extensions to the current model, including the incorporation of multiple time scales and recurrent circuit mechanisms. Adaptation operates over a range of time scales and thus may be best described by a scale-invariant power law, which could be approximated by extending the model using a sum of exponential processes (56). Our model also did not include any recurrent dynamics, because we focused on the feedforward propagation of intrinsic suppression. Yet, recurrent connections are abundant in sensory systems and most likely do contribute to adaptation. There is some evidence suggesting that recurrent mechanisms contribute to adaptation at very short time scales of up to 100 ms (57). During the first 50 to 100 ms after exposure, adaptation to an oriented grating produces a perceptual boundary shift in the opposite direction of the classical tilt aftereffect (58). This observation was predicted by a recurrent V1 model that only predicted repulsive tuning shifts (6). Repulsive shifts are indeed more common in V1 when each test stimulus is immediately preceded by an adapter (13, 18), whereas adaptation seems to produce mostly attractive shifts at longer gaps (14, 43, 59), consistent with the effects of intrinsic suppression in the proposed model (Fig. 4 and fig. S5; although repulsive shifts were more common in highly responsive units; fig. S6). These results seem to suggest that recurrent interactions contribute in the first (few) 100 ms, whereas qualitatively different longer adaptation effects might be best accounted for by intrinsic suppression.

The results of the noisy doodle experiment in humans (Fig. 6) could be explained by local light adaptation to the adapter noise patterns. It is unclear where in the visual system such local light adaptation would take place. In principle, it could take place partly or totally at the level of photoreceptors in the retina. However, given that each noise pixel was only 0.3 × 0.3 visual degrees and given that luminance was distributed independently across noise pixels, inherent variability in the gaze of a fixating subject poses a limit on the contribution of photoreceptor adaptation (60). Most likely, the increased performance observed in the behavioral data results from a combination of adaptation at different stages of processing, including the retina. The proposed computational model does not incorporate adaptation at the receptor level (i.e., pixels), but future models could incorporate adaptation in both the input layer and later processing layers.

Overall, the current framework connects systems to cellular neuroscience in one comprehensive multilevel model by including an activation-based, intrinsic suppression mechanism in a deep neural network. Response suppression cascading through a feedforward hierarchical network changed the functional state of the network similar to visual adaptation, producing complex downstream neural adaptation effects as well as perceptual aftereffects. These results demonstrate that intrinsic neural mechanisms may contribute substantially to the dynamics of sensory processing and perception in a temporal context.

MATERIALS AND METHODS

Computational models

Implementing intrinsic suppression. We used the AlexNet architecture (Fig. 1A) (35), with weights pretrained on the ImageNet dataset (61) as a model for the ventral visual stream. We implemented an exponentially decaying intrinsic adaptation state (62) to simulate neuronally intrinsic suppression. Specifically, in all layers (except the decoder), each unit had an intrinsic adaptation state st, which was updated at each time step t based on its previous state st−1 and the previous response rt−1 (i.e., activation after the ReLU rectification and linearization operation)st=αst−1+(1−α)rt−1(1)where α is a constant in (0,1) determining the time scale of the decay (Fig. 1B). This intrinsic adaptation state is then subtracted from the unit’s current input xt (given weights W and bias b) before applying the rectifier activation function σ, so thatrt=σ(b+Wxt−βst)(2)where β is a constant that scales the amount of suppression. Thus, strictly speaking, Eq. 2 modifies the bias and thus responsivity of the unit, before applying σ, to avoid negative activations. For β > 0, these model updating rules result in an exponentially decaying response for constant input that recovers in case of no input (Fig. 1B), simulating an activation-based suppression mechanism intrinsic to each individual neuron. te that β < 0 would lead to response enhancement and β = 0 would leave the response unchanged. By implementing this mechanism across discrete time steps in AlexNet, we introduced a temporal dimension to the network (Fig. 1C). This model was implemented using TensorFlow v1.11 in Python. Throughout the paper, we use α = 0.96 and β = 0.7 unless indicated otherwise (in Fig. 7, those parameters are tuned).

Decision boundaries. Perceptual aftereffects are typically measured by computing shifts in the decision boundary along a stimulus dimension. We evaluated boundary shifts in the model using a set of face stimuli that morphed from average male to average female in 100 steps (created using webmorph.org) and measured category decision boundaries before and after adaptation using the 101 face-morph images (Fig. 3, A to C). The experiments were simulated by exposing the model to an adapter image for 100 time steps, followed by a gap of uniform gray input for 10 time steps before presenting the test image. The results were qualitatively similar when the number of time steps was changed.

To measure the pre- and postadaptation decision boundaries for a given layer, we trained a logistic regression classifier to discriminate between male and female faces using the preadaptation activations of responsive units for the full stimulus set. After training, the classifier can output female/male class probability estimates for any given activation pattern. Thus, we used the trained classifier to provide female/male probability estimates for each morph level, based on either the pre- or postadaptation activation patterns. The decision boundary is then given by the morph level associated with a female/male class probability of P = 0.5, which was estimated by fitting a psychometric function on the class probabilities (average R2 of at least 0.99 per layer).

Face-gender discriminability. To assess model changes in face-gender discriminability in Fig. 3J, we calculated the stimulus discriminability at each morph level of the stimulus dimension before and after adaptation. An increased discriminability between morph levels can be conceptualized as an increased perceived change in morph levels with respect to a certain physical change in morph level. Thus, to quantify discriminability, a linear mapping was fit to predict stimulus morph levels from preadaptation unit activations using partial least squares regression (using four components). We then used this linear mapping to predict morph levels from activation patterns before and after adaptation. If adaptation increases discriminability, then the change in model-estimated morph level y with respect to a physical change in morph level m should also increase. Thus, to quantify the change in discriminability at morph level m, we calculated the absolute derivative of the predicted postadaptation morph level (ympost), normalized by the absolute derivative of the predicted preadaptation morph level (ympre): |Δympost|/|Δympre|.

Selectively retaining tuning or magnitude changes. For Fig. 4B, we manipulated the postadaptation layer activations to only contain either tuning changes or magnitude changes. To retain only tuning changes, we started with the postadaptation activation patterns and multiplied the activation of each unit by a constant so that the resulting mean activation matched the preadaptation mean value. On the other hand, to retain only magnitude changes, we started with the preadaptation activation patterns and multiplied the activation of each unit by a constant so that the resulting mean activation matched the postadaptation mean value.

Learning adaptation. In Fig. 7, we present two models where adaptation is learned for the noisy doodle classification task: a model with intrinsic adaptation state and a recurrent neural network model. The base feedforward part of the model was based on the AlexNet architecture (35) for the two networks, consisting of three convolutional layers and a fully connected layer followed by a fully connected decoder. The first convolutional layer filters a 28 × 28 × 1 input image with 32 kernels of size 5 × 5 × 1 with a stride of 1 pixel. The second convolutional layer filters the pooled (kernel = 2 × 2, stride = 2) output of the first convolutional layer with 32 kernels of size 5 × 5 × 32 (stride = 1). The third convolutional layer filters the pooled (kernel = 2 × 2, stride = 2) output of the second convolutional layer with 32 kernels of size 3 × 3 × 32 (stride = 1). The fully connected layer has 1024 units that process the output of the third convolutional layer with 50% dropout during training.

The recurrent version was extended with lateral recurrent weights. For convolutional layers, lateral recurrence was implemented as 32 kernels of size 1 × 1 × 32 (stride = 1), which filtered the nonpooled outputs of the layer at time step t − 1 (after ReLu) and were added to the feedforward-filtered inputs of the same layer at time step t (before ReLu). The fully connected layer was recurrent in an all-to-all fashion.

The intrinsic adaptation version was extended with adaptation states, as described in the “Implementing intrinsic suppression” section, of which the α and β parameters were now also trained using back-propagation. The β parameters were initialized at 0 (i.e., no adaptation), and the α parameters were initialized using a uniform distribution ranging from 0 to 1.

Both the recurrent and intrinsic adaptation models were trained on the doodle classification task using TensorFlow v1.11 in Python. We used a training set of 500,000 doodle images (https://github.com/googlecreativelab/quickdraw-dataset; 100,000 per category), with a separate set of 1000 images to select hyperparameters and evaluate the loss and accuracy during training. We used the Adam optimization algorithm (63) with a learning rate of 0.001, the sparse softmax cross entropy between logits and labels cost function, a batch size of 100, and 50% training dropout in fully connected layers. For the weights, we used Gaussian initialization, with the scale correction proposed by Glorot and Bengio (64). Each model was trained for five epochs on the training set, which was sufficient for the loss and accuracy to saturate. Generalization performance was then tested on a third independent set of 5000 images.

Neurophysiology

We present neurophysiological data from two previously published studies to compare them with the neural adaptation effects of the proposed computational model: single-cell recordings from IT (n = 97) cortex of one macaque monkey G (37) and multi-unit recordings from V1 (n = 55) and latero-intermediate visual area (LI; n = 48) of three rats (12). For methodological details about the recordings and the tasks, we refer to the original papers.

Psychophysics

Before starting the data collection, we preregistered the study design and hypothesis on the Open Science Framework at https://osf.io/tdb37/ where all the source code and data can be retrieved.

Participants. A total of 17 volunteers (10 female, ages 19 to 50) participated in our doodle categorization experiments (Fig. 6). In accordance with our preregistered data exclusion rule, two male participants were excluded from analyses because we could not record eye tracking data. All subjects gave informed consent, and the studies were approved by the Institutional Review Board at Children’s Hospital, Harvard Medical School.

Stimuli. The stimulus set consisted of hand-drawn doodles of apples, cars, faces, fish, and flowers from the Quick, Draw! dataset (https://github.com/googlecreativelab/quickdraw-dataset). We selected a total of 540 doodles (108 from each of the five categories) that were judged complete and identifiable. We lowered the contrast of each doodle image (28 × 28 pixels) to either 22 or 29% of the original contrast, before adding a Gaussian noise pattern (SD = 0.165 in normalized pixel values) of the same resolution. The higher contrast level (29%) was chosen as a control so that the doodle was relatively visible in one-sixth of the trials and was not included in the analyses. The average categorization performance on these high-contrast trials was 74% (SD = 8.3%), versus 63% (SD = 8.9%) in the low-contrast trials.

Experimental protocol. Participants had to fixate a cross at the center of the screen to start a trial. Next, an adapter image was presented (for 0.5, 2, or 4 s), followed by a blank interval (of 50, 250, or 500 ms), a test image (for 500 ms), and lastly a response prompt screen. The test images were noisy doodles described in the above paragraph. The adapter image could either be an empty frame (defined by a white square filled with the background color), the same mosaic noise pattern as the one of the subsequent test image, or a randomly generated different noise pattern (Fig. 6). Participants were asked to keep looking at the fixation cross, which remained visible throughout the entire trial, until they were prompted to classify the test image using keyboard keys 1 to 5. All images were presented at 9° × 9° from a viewing distance of approximately 52 cm on a 19-inch cathode ray tube monitor (Sony Multiscan G520; 1024 × 1280 resolution), while we continuously tracked eye movements using a video-based eye tracker (EyeLink 1000, SR Research, Canada). Trials where the root mean square deviation of the eye movements exceeded 1° of visual angle during adapter presentation were excluded from further analyses. The experiment was controlled by custom code written in MATLAB using Psychophysics Toolbox Version 3.0 (65).

Data analysis

Selectivity index. For the face-gender experiments, we calculated a selectivity index based on the average activation of a unit to male (morph level < 50%) and female (morph level > 50%) facesSIg=(AF−AM)/(AF+AM)(3)

A value >0 indicates stronger activation for female faces, and a value <0 indicates stronger activation for male faces.

REFERENCES AND NOTES

  1. T. C. Kietzmann, P. M. Clure, N. Kriegeskorte, Deep neural networks in computational neuroscience, in Oxford Research Encyclopedia of Neuroscience (Oxford Univ. Press, 2019), pp. 1–28.

  2. M. A. Webster, J. S. Werner, D. J. Field, Adaptation and the phenomenology of perception, in Fitting the Mind to the WorldAdaptation and After-Effects in High-Level Vision (Oxford Univ. Press, 2005), chap. 10, pp. 241–278.

  3. G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, W. Maass, Long short-term memory and learning-to-learn in networks of spiking neurons, in Proceedings of the 32nd International Conference on Neural Information Processing Systems (December 2018), pp. 795–805.

  4. D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, in Proceedings of the 3rd International Conference on Learning Representations (ICLR) (San Diego, 2015), pp. 1–15.

  5. X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the 13th International Conference On Artificial Intelligence and Statistics (2010), vol. 9, pp. 249–256.

Acknowledgments: Funding: This work was supported by Research Foundation Flanders, Belgium (fellowship of K.V.), by NIH grant R01EY026025, and by the Center for Brains, Minds and Machines, funded by NSF Science and Technology Centers Award CCF-1231216. Author contributions: K.V. conceived the model and experiment; K.V., X.B., and G.K. designed the model and experiment; K.V. collected the data, implemented the model, and carried out analyses; K.V. and G.K. wrote the manuscript, with contributions from X.B. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors. All the psychophysics data and source code are available at https://osf.io/tdb37/.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Close
Close