Dictionary Definition
operant adj : having influence or producing an
effect; "many emotional determinants at work"; "an operant
conscience" [syn: at
work(p)]
User Contributed Dictionary
English
Noun
- An operative person or thing.
- A class of behavior that produces consequences by operating upon the environment.
Derived terms
See also
Extensive Definition
Operant conditioning is the use of consequences
to modify the occurrence and form of behavior. Operant conditioning
is distinguished from classical
conditioning (also called respondent conditioning, or Pavlovian
conditioning) in that operant conditioning deals with the modification
of "voluntary behavior" or operant behavior. Operant behavior
"operates" on the environment and is maintained by its
consequences, while classical conditioning deals with the
conditioning of respondent behaviors which are elicited by antecedent conditions.
Behaviors conditioned via a classical conditioning procedure are
not maintained by consequences.
Reinforcement, punishment, and Extinction
Reinforcement and punishment, the core tools of operant conditioning, are either positive (delivered following a response), or negative (withdrawn following a response). This creates a total of four basic consequences, with the addition of a fifth procedure known as extinction (i.e. no change in consequences following a response)It's important to note that organisms are not
spoken of as being reinforced, punished, or extinguished; it is the
response that is reinforced, punished, or extinguished.
Additionally, reinforcement, punishment, and extinction are not
terms whose use is restricted to the laboratory. Naturally
occurring consequences can also be said to reinforce, punish, or
extinguish behavior and are not always delivered by people.
- Reinforcement is a consequence that causes a behavior to occur with greater frequency.
- Punishment is a consequence that causes a behavior to occur with less frequency.
- Extinction is the lack of any consequence following a behavior. When a behavior is inconsequential, producing neither favorable nor unfavorable consequences, it will occur with less frequency.
Four contexts of operant conditioning: Here the
terms "positive" and "negative" are not used in their popular
sense, but rather: "positive" refers to addition, and "negative"
refers to subtraction. What is added or subtracted may be either
reinforcement or punishment. Hence positive punishment is sometimes
a confusing term, as it denotes the addition of punishment (such as
spanking or an electric shock), a context that may seem very
negative in the lay sense. The four procedures are:
- Positive reinforcement occurs when a behavior (response) is followed by a favorable stimulus (commonly seen as pleasant) that increases the frequency of that behavior. In the Skinner box experiment, a stimulus such as food or sugar solution can be delivered when the rat engages in a target behavior, such as pressing a lever.
- Negative reinforcement occurs when a behavior (response) is followed by the removal of an aversive stimulus (commonly seen as unpleasant) thereby increasing that behavior's frequency. In the Skinner box experiment, negative reinforcement can be a loud noise continuously sounding inside the rat's cage until it engages in the target behavior, such as pressing a lever, upon which the loud noise is removed.
- Positive punishment (also called "Punishment by contingent stimulation") occurs when a behavior (response) is followed by an aversive stimulus, such as introducing a shock or loud noise, resulting in a decrease in that behavior.
- Negative punishment (also called "Punishment by contingent withdrawal") occurs when a behavior (response) is followed by the removal of a favorable stimulus, such as taking away a child's toy following an undesired behavior, resulting in a decrease in that behavior.
Also:
- Avoidance learning is a type of learning in which a certain behavior results in the cessation of an aversive stimulus. For example, performing the behavior of shielding one's eyes when in the sunlight (or going indoors) will help avoid the aversive stimulation of having light in one's eyes.
- Extinction occurs when a behavior (response) that had previously been reinforced is no longer effective. In the Skinner box experiment, this is the rat pushing the lever and being rewarded with a food pellet several times, and then pushing the lever again and never receiving a food pellet again. Eventually the rat would cease pushing the lever.
- Noncontingent reinforcement refers to response-independent delivery of stimuli identified serve as reinforcers for some behaviors of that organism. However, this typically entails time-based delivery of stimuli identified as maintaining aberrant behavior, which serves to decrease the rate of the target behavior. As no measured behavior is identified as being strengthened, there is controversy surrounding the use of the term noncontingent "reinforcement".
Thorndike's law of effect
Operant conditioning, sometimes called instrumental conditioning or instrumental learning, was first extensively studied by Edward L. Thorndike (1874-1949), who observed the behavior of cats trying to escape from home-made puzzle boxes. When first constrained in the boxes, the cats took a long time to escape. With experience, ineffective responses occurred less frequently and successful responses occurred more frequently, enabling the cats to escape in less time over successive trials. In his Law of Effect, Thorndike theorized that successful responses, those producing satisfying consequences, were "stamped in" by the experience and thus occurred more frequently. Unsuccessful responses, those producing annoying consequences, were stamped out and subsequently occurred less frequently. In short, some consequences strengthened behavior and some consequences weakened behavior. Thorndike produced the first known learning curves through this procedure. B.F. Skinner (1904-1990) formulated a more detailed analysis of operant conditioning based on reinforcement, punishment, and extinction. Following the ideas of Ernst Mach, Skinner rejected Thorndike's mediating structures required by "satisfaction" and constructed a new conceptualization of behavior without any such references. So while experimenting with some homemade feeding mechanisms Skinner invented the operant conditioning chamber which allowed him to measure rate of response as a key dependent variable using a cumulative record of lever presses or key pecks.Operant Conditioning vs Fixed Action Patterns
Skinner's construct of instrumental learning is
contrasted with what Nobel Prize winning biologist Konrad
Lorenz termed "fixed action patterns," or reflexive, impulsive,
or instinctive behaviors. These behaviors were said by Skinner and
others to exist outside the parameters of operant conditioning but
were considered essential to a comprehensive analysis of
behavior.
In dog training, the use of the prey drive,
particularly in training working dogs, detection dogs, etc., the
stimulation of these fixed action patterns, relative to the dog's
predatory instincts, are the key to producing very difficult yet
consistent behaviors, and in most cases, do not involve operant,
classical,
or any other kind of conditioning. While evolutionary processes
shaped these fixed action patterns, the patterns themselves
remained stable long enough to be shaped by the long time span
necessary for evolution because of their survival function (i.e.,
operant conditioning).
According to the laws of operant conditioning,
any behavior that is consistently rewarded, every single time, will
extinguish at a faster rate while intermittently reinforcing
behavior leads to more stable rates of behavior that are relatively
more resistant to extinction. Thus, in detection dogs, any correct
behavior of indicating a "find," must always be rewarded with a tug
toy or a ball throw early on for initial acquisition of the
behavior. Thereafter, fading procedures, in which the rate of
reinforcement is "thinned" (not every response is reinforced) are
introduced, switching the dog to an intermittent schedule of
reinforcement, which is more resistant to instances of
non-reinforcement.
Nevertheless, some trainers are now using the
prey drive to train pet dogs and find that they get far better
results in the dogs' responses to training than when they only use
the principles of operant conditioning which, according to Skinner
and his students Keller and
Marian
Breland (who invented clicker
training), break down when strong instincts are at play.
Criticisms
Thorndike's law of effect specifically requires
that a behavior be followed by satisfying consequences for learning
to occur. There are, however, cases in which learning can be shown
to occur without good or bad effects following the behavior. For
instance, a number of experiments examining the phenomenon of
latent
learning showed that a rat needn't receive a satisfying reward
(food, if hungry; water, if thirsty) in order to learn a maze;
learning that becomes apparent immediately after the desired reward
is introduced. However, views claiming such research invalidates
theories of operant conditioning are molecular to a fault. If the
rat has a history of "searching behavior" being reinforced in novel
environments, the behavior will occur in new environments. This is
especially plausible in a species which scavenges for food and has
thus likely inherited a propensity for searching behavior to be
sensitive to reinforcement. Behaving during initial extinction
trials as the organism had during reinforcement trials is not proof
of latent learning, as behavior is a function of the history of the
individual organism and its genetic endowment and is never
controlled by future consequences. That an organism continues to
respond during unreinforced trials has been well-established when
studying intermittent schedules of reinforcement.
A different experiment, in humans, showed that
"punishing" the correct behavior may actually cause it to be more
frequently taken (i.e. stamp it in). Subjects are given a number of
pairs of holes on a large board and required to learn which hole to
poke a stylus through for each pair. If the subjects receive an
electric shock for punching the correct hole, they learn which hole
is correct more quickly than subjects who receive an electric shock
for punching the incorrect hole. This cannot, however, be
accurately described as punishment if it is increasing the
probability of the behavior.
Biological correlates of operant conditioning
The first scientific studies identifying neurons that responded in ways
that suggested they encode for conditioned stimuli came from work
by Rusty Richardson and Mahlon deLong. They showed that nucleus
basalis neurons, which release
acetylcholine broadly throughout the cerebral
cortex, are activated shortly after a conditioned stimulus, or
after a primary reward if no conditioned stimulus exists. These
neurons are equally active for positive and negative reinforcers,
and have been demonstrated to cause plasticity
in many cortical
regions. Evidence also exists that
dopamine is activated at similar times. The dopamine pathways
encode positive reward only, not aversive reinforcement, and they
project much more densely onto frontal
cortex regions. Cholinergic
projections, in contrast, are dense even in the posterior cortical
regions like the primary
visual cortex. A study of patients with Parkinson's
disease, a condition attributed to the insufficient action of
dopamine, further illustrates the role of dopamine in positive
reinforcement. It showed that while off their medication, patients
learned more readily with aversive consequences than with positive
reinforcement. Patients who were on their medication showed the
opposite to be the case, positive reinforcement proving to be the
more effective form of learning when the action of dopamine is
high.
Factors that alter the effectiveness of consequences
When using consequences to modify a response, the
effectiveness of a consequence can be increased or decreased by
various factors. These factors can apply to either reinforcing or
punishing consequences.
- Satiation: The effectiveness of a consequence will be reduced if the individual's "appetite" for that source of stimulation has been satisfied. Inversely, the effectiveness of a consequence will increase as the individual becomes deprived of that stimulus. If someone is not hungry, food will not be an effective reinforcer for behavior.
- Immediacy: After a response, how immediately a consequence is then felt determines the effectiveness of the consequence. More immediate feedback will be more effective than less immediate feedback. If someone's license plate is caught by a traffic camera for speeding and they receive a speeding ticket in the mail a week later, this consequence will not be very effective against speeding. But if someone is speeding and is caught in the act by an officer who pulls them over, then their speeding behavior is more likely to be affected.
- Contingency: If a consequence does not contingently (reliably, or consistently) follow the target response, its effectiveness upon the response is reduced. But if a consequence follows the response reliably after successive instances, its ability to modify the response is increased. If someone has a habit of getting to work late, but is only occasionally reprimanded for their lateness, the reprimand will not be a very effective punishment.
- Size: This is a "cost-benefit" determinant of whether a consequence will be effective. If the size, or amount, of the consequence is large enough to be worth the effort, the consequence will be more effective upon the behavior. An unusually large lottery jackpot, for example, might be enough to get someone to buy a one-dollar lottery ticket (or even buying multiple tickets). But if a lottery jackpot is small, the same person might not feel it to be worth the effort of driving out and finding a place to buy a ticket. In this example, it's also useful to note that "effort" is a punishing consequence. How these opposing expected consequences (reinforcing and punishing) balance out will determine whether the behavior is performed or not.
Most of these factors exist for biological
reasons. The biological purpose of the Principle of Satiation is to
maintain the organism's homeostasis. When an
organism has been deprived of sugar, for example, the effectiveness
of the taste of sugar as a reinforcer is high. However, as the
organism reaches or exceeds their optimum blood-sugar levels, the
taste of sugar becomes less effective, perhaps even aversive.
The principles of Immediacy and Contingency exist
for neurochemical reasons. When an organism experiences a
reinforcing stimulus, dopamine pathways in the brain
are activated. This network of pathways "releases a short pulse of
dopamine onto many dendrites, thus broadcasting a
rather global reinforcement signal to postsynaptic neurons." This
makes recently activated synapses able to increase their
sensitivity to efferent signals, hence increasing the probability
of occurrence for the recent responses preceding the reinforcement.
These responses are, statistically, the most likely to have been
the behavior responsible for successfully achieving reinforcement.
But when the application of reinforcement is either less immediate
or less contingent (less consistent), the ability of dopamine to
act upon the appropriate synapses is reduced.
Operant variability
Operant variability is what allows a response to
adapt to new situations. Operant behavior is distinguished from
reflexes in that its response topography (the form of the response)
is subject to slight variations from one performance to another.
These slight variations can include small differences in the
specific motions involved, differences in the amount of force
applied, and small changes in the timing of the response. If a
subject's history of reinforcement is consistent, such variations
will remain stable because the same successful variations are more
likely to be reinforced than less successful variations. However,
behavioral variability can also be altered when subjected to
certain controlling variables.
An extinction burst will often occur when an
extinction procedure has just begun. This consists of a sudden and
temporary increase in the response's frequency , followed by the
eventual decline and extinction of the behavior targeted for
elimination. Take, as an example, a pigeon that has been reinforced
to peck an electronic button. During its training history, every
time the pigeon pecked the button, it will have received a small
amount of bird seed as a reinforcer. So, whenever the bird is
hungry, it will peck the button to receive food. However, if the
button were to be turned off, the hungry pigeon will first try
pecking the button just as it has in the past. When no food is
forthcoming, the bird will likely try again... and again, and
again. After a period of frantic activity, in which their pecking
behavior yields no result, the pigeon's pecking will decrease in
frequency.
The evolutionary advantage of this extinction
burst is clear. In a natural environment, an animal that persists
in a learned behavior, despite not resulting in immediate
reinforcement, might still have a chance of producing reinforcing
consequences if they try again. This animal would be at an
advantage over another animal that gives up too easily.
Extinction-induced variability serves a similar
adaptive role. When extinction begins, and if the environment
allows for it, an initial increase in the response rate is not the
only thing that can happen. Imagine a bell curve.
The horizontal axis would represent the different variations
possible for a given behavior. The vertical axis would represent
the response's probability in a given situation. Response variants
in the middle of the bell curve, at its highest point, are the most
likely because those responses, according to the organism's
experience, have been the most effective at producing
reinforcement. The more extreme forms of the behavior would lie at
the lower ends of the curve, to the left and to the right of the
peak, where their probability for expression is low.
A simple example would be a person inside a room
opening a door to exit. The response would be the opening of the
door, and the reinforcer would be the freedom to exit. For each
time that same person opens that same door, they do not open the
door in the exact same way every time. Rather, each time they open
the door a little differently: sometimes with less force, sometimes
with more force; sometimes with one hand, sometimes with the other
hand; sometimes more quickly, sometimes more slowly. Because of the
physical properties of the door and its handle, there is a certain
range of successful responses which are reinforced.
Now imagine in our example that the subject tries
to open the door and it won't budge. This is when
extinction-induced variability occurs. The bell curve of probable
responses will begin to broaden, with more extreme forms of
behavior becoming more likely. The person might now try opening the
door with extra force, repeatedly twist the knob, try to hit the
door with their shoulder, maybe even call for help or climb out a
window. This is how extinction causes variability in behavior, in
the hope that these new variations might be successful. For this
reason, extinction-induced variability is an important part of the
operant procedure of shaping.
Avoidance learning
Avoidance training belongs to negative reinforcement schedules. The subject learns that a certain response will result in the termination or prevention of an aversive stimulus. There are two kinds of commonly used experimental settings: discriminated and free-operant avoidance learning.Discriminated avoidance learning
- In discriminated avoidance learning, a novel stimulus such as a light or a tone is followed by an aversive stimulus such as a shock (CS-US, similar to classical conditioning). During the first trials (called escape-trials) the animal usually experiences both the CS and the US, showing the operant response to terminate the aversive US. By the time, the animal will learn to perform the response already during the presentation of the CS thus preventing the aversive US from occurring. Such trials are called avoidance trials.
Free-operant avoidance learning
- In this experimental session, no discrete stimulus is used to
signal the occurrence of the aversive stimulus. Rather, the
aversive stimulus (mostly shocks) are presented without explicit
warning stimuli.
- There are two crucial time intervals determining the rate of avoidance learning. This first one is called the S-S-interval (shock-shock-interval). This is the amount of time which passes during successive presentations of the shock (unless the operant response is performed). The other one is called the R-S-interval (response-shock-interval) which specifies the length of the time interval following an operant response during which no shocks will be delivered. Note that each time the organism performs the operant response, the R-S-interval without shocks begins newly.
Two-process theory of avoidance
This theory was originally established to explain learning in discriminated avoidance learning. It assumes two processes to take place. a) Classical conditioning of fear. During the first trials of the training, the organism experiences both CS and aversive US(escape-trials). The theory assumed that during those trials classical conditioning takes place by pairing the CS with the US. Because of the aversive nature of the US the CS is supposed to elicit a conditioned emotional reaction (CER) - fear. In classical conditioning, presenting a CS conditioned with an aversive US disrupts the organism's ongoing behavior. b) Reinforcement of the operant response by fear-reduction. Because during the first process, the CS signaling the aversive US has itself become aversive by eliciting fear in the organism, reducing this unpleasant emotional reaction serves to motivate the operant response. The organism learns to make the response during the US, thus terminating the aversive internal reaction elicited by the CS. An important aspect of this theory is that the term "Avoidance" does not really describe what the organism is doing. It does not "avoid" the aversive US in the sense of anticipating it. Rather the organism escapes an aversive internal state, caused by the CS.- One of the practical aspects of operant conditioning with relation to animal training is the use of shaping (reinforcing successive approximations and not reinforcing behavior past approximating), as well as chaining.
Verbal Behavior
In 1957 Skinner published Verbal Behavior a theoretical extension of the work he had pioneered since 1938. This work extended the theory of operant conditioning to human behavior previously assigned to the areas of language, linguistics and other areas. Verbal Behavior is the logical extension of Skinner's ideas, in which he introduced new functional relationship categories such as intraverbals, autoclitics, mands, tacts and the controlling relationship of the audience. All of these relationships were based on operant conditioning and relied on no new mechanisms despite the introduction of new functional categories.Four term contingency
Modern behavior analysis, which is the name of the discipline directly descended from Skinner's work, holds that behavior is explained in four terms: an establishing operation (EO), a discriminative stimulus (Sd), a response (R), and a reinforcing stimulus (Srein or Sr for reinforcers, sometimes Save for aversive stimuli).Operant Hoarding
Operant Hoarding is a term referring to the choice made by a rat, on a compound schedule called a multiple schedule, that maximizes its rate of reinforcement in an operant conditioning context. More specifically, rats were shown to have allowed food pellets to accumulate in a food tray by continuing to press a lever on a continuous reinforcement schedule instead of retrieving those pellets. Retrieval of the pellets always instituted a one-minute period of extinction during which no additional food pellets were available but those that had been accumulated earlier could be consumed. This finding appears to contradict the usual finding that rats behave impulsively in situations in which there is a choice between a smaller food object right away and a larger food object after some delay. See schedules of reinforcement.See also
- Applied behavior analysis, the application of operant behaviorism
- Behaviorism, the family of philosophies behind operant conditioning
- Cognitivism (psychology), a competing theory that invokes internal mechanisms without reference to behavior
- Educational psychology
- Educational technology
- Experimental analysis of behavior
- Habituation
- Matching law
- Negative (positive) contrast effect
- Premack principle
- Reinforcement learning
- Reward system
- Sensitization
- Social conditioning
- Spontaneous recovery
References
Further reading
- Kirsch, I., Lynn, S.J., Vigorito, M. & Miller, R.R. (2004). The role of cognition in classical and operant conditioning. Journal of Clinical Psychology, 60, 369 - 392. Full text
- McSweeney, F.K., Hinson, J.M, & Cannon, C.B. (1996). Sensitization-habituation may occur during operant conditioning. Psychological Bulletin, 120, 256-271. Full text
External links
- Scholarpedia Operant conditioning
- Journal of the Experimental Analysis of Behavior
- Journal of Applied Behavior Analysis
- Behavioural Processes http://www.sciencedirect.com/science?_ob=PublicationURL&_issn=03766357&_pubType=J&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=f2cb4d6abf599fdb991a75e175d8189b&jchunk=50#50
- Society for Quantitative Analysis of Behaviorhttp://sqab.psychology.org/
- Negative reinforcement
- scienceofbehavior.com
Fr:Conditionnement opérant
operant in German: Konditionierung
operant in Spanish: Condicionamiento
operante
operant in Hebrew: התניה אופרנטית
operant in Japanese: オペラント条件づけ
operant in Polish: Warunkowanie
instrumentalne
operant in Chinese: 操作条件反射
operant in Portuguese: Condicionamento
operante
Synonyms, Antonyms and Related Words
actor,
agent, architect, author, conductor, creator, doer, driver, engineer, executant, executor, executrix, fabricator, functionary, handler, maker, manipulator, medium, mover, operative, operator, performer, perpetrator, pilot, practitioner, prime mover,
producer, runner, steersman, subject, worker