What is "Differential Reinforcement of Other Behavior"?

You keep using that word. I do not think it means what you think it means.

At the time of this posting, there were a couple different explanations of this concept floating around dog-pro social media recently, all of which perpetuated common misconceptions. Having written a chunk of my thesis about what a differential reinforcement of other behavior (DRO) procedure looks like, why it's thought to work, and what best practices for using it may be, I thought I'd throw this into the mix.

Although DRO stands for "differential reinforcement of other behavior," it does not involve reinforcing "any other behavior" that occurs instead of the unwanted behavior. In fact, it doesn't necessarily involve reinforcing any behavior at all.

In DRO, reinforcement is contingent only on the passage of a certain amount of time without the occurrence of the unwanted behavior.

There are two main flavors of DRO, both of which can be split into subtypes depending on how the time intervals are set (e.g., the same every time or varied from rep to rep).

In "interval DRO," which is more common, the whole interval of time must go by without the behavior that you want to reduce. When time is up, reinforcement is delivered, regardless of what else the learner is doing.

If the learner does the unwanted behavior, typically the time requirement starts over. Sometimes, if the unwanted behavior occurs right as time runs out, extra time is added to avoid reinforcing the unwanted behavior.

In "momentary" DRO, reinforcement is delivered if the behavior is not occurring at the moment that an interval of time ends, regardless of whether the unwanted behavior occurred at any other time during that interval. If the unwanted behavior is occurring, the reinforcer is not delivered. This appears to be mostly used when it would be too hard to monitor the learner for the whole interval (i.e., if you have a classroom with 30 students and can't watch them all at once). It's also probably less effective than interval DRO.

Delivering the reinforcer when you see any of multiple acceptable alternative behaviors occurring--which is how I have seen DRO described in training circles sometimes--would be differential reinforcement of alternative behavior (DRA). That's a different procedure, in which reinforcement is contingent on the occurrence of specific behaviors, and withheld or diminished somehow when the unwanted behavior occurs.

Although it's referred to as a differential reinforcement procedure, reinforcement may not always be at work in DRO. There are at least four mechanisms thought to make it go; these are some of the most commonly discussed:

  1. Satiation. Best practices in DRO are to start with very low time criteria, so the start of DRO usually involves reinforcement presented densely, which may temporarily diminish the value of the reinforcer and therefore make behavior whose purpose is to get that reinforcer less likely to occur.

  2. Extinction, or the elimination/weakening of the relationship between the unwanted behavior and the reinforcer that maintains it. What we traditionally think of as extinction is the non-availability of reinforcement for a previously reinforced behavior, and DRO typically involves some of that. But noncontingent reinforcement (giving the reinforcer on a time-based schedule whether or not the behavior occurs) can also weaken the contingency between the behavior and the reinforcer, in essence making the behavior seem unnecessary.
    In fact, as Susan Friedman pointed out once during a joint presentation, DRO without extinction is identical to noncontingent reinforcement. Which is an antecedent intervention--something you do before the behavior occurs--not a reinforcement procedure.

  3. Punishment. Yes, that's right--DRO might run on punishment, particularly if the learner can tell that you are resetting the time. A signal that predicts an upcoming delay to reinforcement can become an aversive stimulus (one a learner will behave to avoid or escape). If something tells the learner that reinforcement has been delayed (a timer is reset) contingent on their behavior, that something can become a conditioned punisher, or punisher by association.

  4. And, finally, DRO may work because of accidental reinforcement of alternative behaviors. Whatever is happening when reinforcement gets delivered might get strengthened. But it takes consistency to develop a contingency (a relationship) between a behavior and a reinforcer, and that doesn't always happen when you're delivering reinforcement contingent on time elapsed. Evidence for reinforcement of other behaviors during DRO is sparse, and in my research and my experience with running DRO with clients, the alternative response that emerged during DRO was not one that was occurring with any regularity when the timer went off/reinforcement was delivered in the early stages.

This is not a complete investigation of DRO--I just wanted to clarify the definition for my fellow pros and other interested parties. But we could also talk about why it's often slower than DRA, why some people call it labor intensive or think it's a procedure to be avoided, the potential advantages of interval vs. momentary, best practices for its use, etc.

You can get more deets in any applied behavior analysis textbook. Here I used Cooper, J. O., Heron, T. E., & Heward, W. L. (2020). Applied behavior analysis (3rd ed.). Pearson.

But here are other sources related to the "mechanisms" section: Jessel, J., Borrero, J. C., & Becraft, J. L. (2015). Differential reinforcement of other behavior increases untargeted behavior. Journal of Applied Behavior Analysis, 48(2), 402–416. https://doi.org/10.1002/jaba.204

Jessel, J., & Ingvarsson, E. T. (2016). Recent advances in applied research on DRO procedures. Journal of Applied Behavior Analysis, 49, 991-995. https://doi.org/10.1002/jaba.323

Langford, J. S., Pitts, R. C., & Hughes, C. E. (2019). Assessing functions of stimuli associated with rich-to-lean transitions using a choice procedure. Journal of the Experimental Analysis of Behavior, 112(1), 97–110. https://doi.org/10.1002/jeab.540

Poling, A., & Ryan, C. (1982). Differential-reinforcement-of-other-behavior schedules: Therapeutic applications. Behavior Modification, 6(1), 3–21. https://doi.org/10.1177/01454455820061001

Rey, C. N., Betz, A. M., Sleiman, A. A., Kuroda, T., & Podlesnik, C. A. (2020). The role of adventitious reinforcement during differential reinforcement of other behavior: A systematic replication. Journal of Applied Behavior Analysis, 53(4), 2440–2449. https://doi.org/10.1002/jaba.678

Thompson, R. H., & Iwata, B. A. (2005). A review of reinforcement control procedures. Journal of Applied Behavior Analysis, 38(2), 257–278. https://doi.org/10.1901/jaba.2005.176-03