Kiki Yablon Dog Training — Chicago, Illinois

View Original

How to Raise Criteria in Shaping

Shaping—teaching a complex behavior by reinforcing behaviors that successively look more and more like your final goal--is hard to describe, and to prescribe. When it goes well, it can feel like magic--whoa, she started clicking and treating for just looking and now the dog is going over and climbing onto the thing they were looking at and sitting on it! And if you're a casual observer, it can seem very mysterious how the trainer got from phase 1 to phase 3 (to borrow a metaphor from South Park):

It's often mysterious even to the trainer who did it, leading some to refer to setting criteria as the "art" part of the "art and science" of shaping. But science informs even art--blue paint looks blue because of its chemistry, and behavior moves because of environmental conditions.

There are some guidelines floating around out there about when to start waiting for more behavior to reinforce--like when you’re reinforcing the current approximation at a high rate, say, 10-15 clicks per minute, or when your learner has offered the current approximation on some high percentage of opportunities to do so--but the problem with these is that they don’t tell you how many opportunities to give or how many minutes to train at each level, or make it clear what is supposed to make the learner go from offering the current approximation at a high rate to offering something beyond it just because you decided it's time for them to do so.

So here are some thoughts on why the next approximation might happen--and how to make it more likely to happen if it's not.

Use a high rate of reinforcement to engender “discretionary effort.” Discretionary effort is a phenomenon observed by Aubrey Daniels and others (more research needed!) in the context of workplace behavior. Daniels describes it as going beyond the minimum required for reinforcement, and attributes it to frequent positive reinforcement. A high rate of reinforcement requires easily achievable criteria, so this is the good ol' practice of splitting so you can click/treat often and keeping your eyes open for what you want to start happening.

Reinforce variability itself. Most if not all behavior is performed with some naturally occurring variability between performances (something I learned from Susan Friedman, who sends along this food for thought), perhaps the better for the environment to be able to select some variations over others. Sometimes our high rate of reinforcement approach can narrow this variability—if you reinforce a very specific approximation a ton, you’ll see more of that and less of other variations. But some very good shapers I know (cough Hannah Branigan cough) often look like they are reinforcing absolutely everything during shaping. What they may be doing is reinforcing variability itself, which gives you more to choose from--including some reps that start to look more like the final behavior. So what you can do is basically reinforce both a little above and a little below your current criterion until you see a variation you want starting to be offered. You can then start to reinforce that one more, or exclusively. (I think this is the technique that led my friend Julianna DeWillems to use a metaphor I love, "sweeping up a pile of dust," to describe shaping.)

Use treat placement strategically. I’ve written a whole other post on this, but briefly, think of your click as the next cue in a chain, rather than just as a conditioned reinforcer. Then think of the behavior that the learner will do to collect the treat as being cued by the click and reinforced by the treat. Then think about how you would chain two behaviors together if you wanted them to eventually blend: You would cue one, and when it was performed, cue the next, and then c/t after the second behavior, then repeat. Then when you saw the dog anticipating the second cue, and starting to jump ahead to the second behavior on finishing the first, you would fade the second cue out. What initially happens after the click, with repetition, will start to creep before it, and when you see that, you can delay your click until the learner has done the second behavior as well. (Note: this effect can also happen when you don’t want it to, another reason to constantly be thinking about what is happening between the click and the treat and adapting your treat placement with your end goal in mind during training, especially shaping.)

Change the antecedent conditions. Big, purposeful examples of this are introducing lures or arranging props like gates or platforms to encourage certain paths of movement, but even tiny changes can get you slightly different behavior from what you're getting now. If you have been sitting down the whole time you've been reinforcing the same approximation over and over, that context is probably cuing what you are getting now. So try kneeling or standing up, or moving to a different chair--these small changes may provoke just enough variability for you to start seeing, and reinforcing, something further along the path toward your goal, or at least to start being able to reinforce variability.

Surf the extinction bursts. This is an older technique and involves stopping reinforcement for the current criterion, or doing “twofers” (reinforcing only every other one) to spur variability. The variability here is a predictable by-product of extinction (nonreinforcement), which can also be a product of intermittent reinforcement (because it involves partial nonreinforcement). This can be done really well, but in my experience it’s harder on the dog if you have already narrowed variability too tightly and you don't change anything else about the context, or if you are waiting for too big a leap. Even if you get a variation you want to reinforce, you’re likely to get some emotional behavior (think “frustration”) along for the ride that you probably don’t want to bake in to your training by reinforcing it along with the bigger effort on the goal behavior. If you get stuck, think about changing treat placement or antecedent conditions first.