In health psychology, there exists a lack of conceptual clarity regarding a number of terms that are at the core of psychological science. True, this problem exists in psychology in general, but the terms Behavior Change Technique (from the BCT taxonomy approach) and Method for Behavior Change (from the Intervention Mapping approach) have exacerbated matters within behavior change science. In this post, I will discuss this in more detail, based on a recent Twitter discussion that erupted around whether a psychological variable targeted by a behavior change technique is a mediator or not:
https://twitter.com/matherion/status/1004713931833839617
In this post, I will explain more in detail what I mean (you may want to read the Twitter thread first though).
Some definitions
I will start with some definitions. Most people will know these, and I don’t expect any of these to be controversial, but conceptual clarity is the point of this post, so making sure we share the same definitions of the central terms seems a wise starting point.
Theory: I here define as a theory any collection of statements about the existence, nature, and/or relationships between constructs in order to describe an aspect of reality or laws governing that reality. Within psychology, therefore, theories often aim to describe human psychology in terms of variables and processes. Theories are ‘codifications’ of large numbers of observations from empirical research. Note that this sharply distinguishes theories from logic models (sometimes also confusingly called ’theories of change), which are eclectic models where (components of) multiple theories and empirical insights are combined to describe any specific tightly constrained context, population, or behavior. The RAA is a theory: the result of a determinant study into a given behavior and the associated list of relevant determinants is a logic model. Theories are the product of basic science; logic models are the product of applied science (specifically, application of theory to a specific real-world problem or situation).
Construct: A construct is a psychological variable as defined in a theory. Constructs have:
names to identify them;
definitions that explain exactly which aspects of human psychology are a part of the construct and which aspects are not a part of the construct;
and should normally have guidelines for operationalisation.
Operationalisation: An operationalisation is the interface between a construct and reality. Constructs, being by definition both theoretical and psychological, cannot be observed or changed directly: an intermediate step is always required. However, a construct’s definition should normally allow development of guidelines that can be used to select or develop instruments to measure or manipulate the construct (after all, if it does not, the theory is necessarily not falsifiable and no evidence can be collected to support or refute the theory). In psychological research, there are two types of operationalisations, manipulations and measurement instruments, and operationalisations consist of:
one or more stimuli
a procedure
(only for measurement instruments: one or more means to register a response)
Measurement instrument: a measurement instrument is an operationalisation of a construct that was developed or selected to measure the target construct. Specifically, the stimuli and procedure are developed or selected to trigger processes that ultimately result in a response (that is then registered), where the target construct plays an important role in generating that response, such that the registered response allows inferences regarding the target construct. It is never the goal, and often undesirable, for the stimuli, procedure, and means for response registration to influence the target construct.
Manipulation: A manipulation is an operationalisation of a psychological construct that was developed or selected to influence the target construct. Specifically, the stimuli and procedure are developed or selected to trigger processes that ultimately result in a change in the psychological state of the participant, specifically in those aspects of their psychology that, conform the construct’s definition, are a part of the target construct.
Reliability: Application of operationalisations is never perfect: measurement instruments suffer from measurement error and manipulations suffer from random variation in their context and interpretation that mean that their effect on the target construct is always a bit different. Thus, all applications of operationalisations introduce some error, and therefore, any application of an operationalisation has a reliability of less than 1. This error captures all random, nonsystematic noise. This definition of error means that low reliability is not a threat to design validity. Low reliability of operationalisations can be compensated for by increasing the sample size (i.e. participants or observations per participant), and research syntheses (e.g. meta-analyses) will not show systematic bias (in the absence of publication bias, which admittedly is unrealistic).
Validity: In addition to random noise, applications of operationalisations suffer from systematic distortions:
manipulations never influence the entirety of the target construct; and their influence is never restricted to those aspects of the human psychology that are defined as parts of the target construct. In other words, manipulations never fully change the target construct, and they always also change (parts of) other constructs.
measurement instruments never fully capture the entirety of the target construct; and the variation in the registered responses is never caused exclusively by the target construct. In other words, measurement instruments never fully measure the target construct, and they always also measure (parts of) other constructs.
Validity is the degree to which the application of an operationalisation corresponds to the target construct and only the target construct. Like reliablity, validity is not a property of an operationalisation, but of an application of the operationalisation in a sample. For example, an operationalisation in Dutch may have excellent validity and reliability in a Dutch sample, but perform horribly in a Cantonese sample. Similarly, validity of operationalisations developed fifty years ago may have considerably deteriorated as language and norms changes over time, even in the same culture.
Validity, like reliability, is never 1 (and probably never exactly 0), but always have some value in between. That is, no manipulation or measure corresponds exhaustively and exclusively to its target construct.
It is important to take a moment to reflect on the implications: the less specific an operationalisation is, the lower its validity. An attitude questionnaire that measures not only attitude but also perceived norms has low validity. The implication is that if the collected data series correlates with behavior, it is not possible to infer that attitude is associated to behavior. The same is true for manipulations: if a manipulation of attitude also changes perceived norms, it has low validity. If an experiment shows an effect of such a manipulation on, for example, intention, it is not possible to infer that attitude is a causal antecedent of intention.
In other words: validity is crucial. If in a study (i.e. when applied to a sample), an operationalisation has low validity, this means that the study’s design has low validity. In such cases, it is not ethically straightforward to still draw any conclusions.
- Behavior change principles (BCPs): Conform our paper about evolutionary learning processes, I define behavior change principles as the overarching category of BCTs as introduced in the 2008 Abraham & Michie article, MBCTs as presumably defined in the forthcoming paper by Martin Hagger, Marta Marques, and probably others (see their Open Science Framework repository here), and the methods of behavior change as introduced in the 1998 Intervention Mapping paper by Bartholomew, Parcel & Kok. Specifically:
[…] use Behaviour Change Principle or Behaviour Change Principle set (both abbreviated to BCP). A BCP is any principle or any set of principles that can be applied to change behaviour, or more accurately, determinants of behaviour, with the assumption that it will be effective. Stated more strongly, we would argue that any intervention that successfully changes one or more determinants of behaviour must therefore involve one or more BCPs.
So, a BCP (and therefore, a BCT, MBCT, or method for behavior change) is just another word valid for a manipulation. Any application of a BCP, therefore, has a validity and a realibility.
Mediation: Mediation is the process where a causal association between two variables operates through a third variable. Specifically, mediation describes a situation where:
a predictor or independent variable has a causal effect on the mediator;
the mediator has a causal effect on the dependent variable;
and those aspects of the mediator that are changed by the predictor are the ones responsible for causing the change in the dependent variable.
This last bit is important because mediation implies that the ultimate change in the dependent variable is caused by changes in the mediator that have been caused by the predictor. In other words, the causal effect of the predictor on the dependent variable must ‘run through’ the mediator.
Operationalisations are not a part of the conceptual model
If you design a study, you usually start from a conceptual model - that is, if the study is to some degree basic research (in applied research, you usually don’t seek to contribute to theory development except perhaps through meta-analysis; which frees up some ‘design degrees of freedom’ which you need in applied research; but simultaneously, utilizing those degrees of freedom, being less rigorous than would be required by basic research, also means you can no longer draw the same conclusions). The conceptual model is a logic model that describes your theoretical expectations and predictions, and can be thought of as a type of structural model in the SEM sense of the word: it represents your constructs and the relationships you are interested in (and usually have hypotheses about).
SEM has the neat feature of separating structural models from measurement models. The measurement models describe aspects of the reliability and validity of the operationalisations, whereas the structural models deal exclusively with the constructs (the latent variables) and their associations. Thus, measurement models describe the operational layer of a study, whereas structural describe the theoretical layer of a study. Only once you are persuaded that your operationalisations are valid, it makes sense to inspect the structural model.
This separation, however, is also present in models as used outside SEM. In mediation analyses, for example (also when conducted using regression analysis or bootstrap methods), the operationalisations are not represented in the model alongside the variables of interest. Instead, the reliability and validity of each operationalisation, as applied in the present sample, is inspected, and if the researcher is confident that each operationalisation is sufficiently valid (low reliability, as explained above, is less problematic; it decreases power (NHST) or accuracy (AIPE), but does not threaten the design’s validity), the researchers from that point on considers the associated data series as valid proxies for the associated target constructs.
So, an example. You have an attitude questionnaire and you use it in a study. In that case:
During study planning, therefore, you are confident that it will perform as a valid operationalisation of attitude in a sample from the population from which you plan to sample.
This means that you are confident that when applied in a sample from that population, this questionnaire measures most (or many) aspects of human psychology that, according to the definition of attitude, together form attitude.
It also means that you are confident that when applied in such a sample, it measures (almost) no aspects of human psychology that fall outside of the definition of attitude.
(If you are not, before starting, confident, then still conducting the study becomes ethically a bit circumspect. After all, if you haven’t yet developed or selected measurement instruments that you’re pretty sure are good measurement instruments of the constructs you’re interested in, don’t you have an obligation to your participants (and the tax payers probably paying for your study) to first get your shit together? I’d argue you do. Of course, reality is different, and you cannot always do everything as properly as you’d want (temporal, monetary constraints etc). But, importantly, _foregoing such due diligence and rigor also means that you have to accept that your design becomes increasingly shaky. _You can’t diminish your design yet still expect to draw equally strong conclusions. Life is harsh.)
Once your data collection concludes, you inspect your data to verify that indeed, in your obtained sample the attitude questionnaire performed as a valid operationalisation of the attitude construct (using e.g. factor analysis, inspecting convergence and divergence with other constructs, etc).
If you are confident that indeed, your operationalisation of attitude is valid in this sample, then from that point onwards, you treat the associated data series (e.g. the mean of the questionnaire’s items) as if it were attitude. You then interpret correlations of that data series with other data series as if they are indicative for the association of the attitude construct with whichever construct the other data series operationalise.
This works exactly the same for manipulations. So, to go by the same example, let’s say you have an attitude manipulation and you use it in a study. In that case:
During study planning, therefore, you are confident that it will perform as a valid operationalisation of attitude in a sample from the population from which you plan to sample.
This means that you are confident that when applied in a sample from that population, this manipulation changes most (or many) aspects of human psychology that, according to the definition of attitude, together form attitude.
It also means that you are confident that when applied in such a sample, it changes (almost) no aspects of human psychology that fall outside of the definition of attitude.
(If you are not, before starting, confident, then still conducting the study becomes ethically a bit circumspect. After all, if you haven’t yet developed or selected manipulations that you’re pretty sure are good manipulations of the constructs you’re interested in, don’t you have an obligation to your participants (and the tax payers probably paying for your study) to first get your shit together? I’d argue you do. Of course, reality is different, and you cannot always do everything as properly as you’d want (temporal, monetary constraints etc). But, importantly, _foregoing such due diligence and rigor also means that you have to accept that your design becomes increasingly shaky. _You can’t diminish your design yet still expect to draw equally strong conclusions. Life is harsh.)
Once your data collection concludes, you inspect your data to verify that indeed, in your obtained sample the attitude manipulation performed as a valid operationalisation of the attitude construct (using e.g. a manipulation check, another operationalisation of attitude (but a measurement instrument), and inspecting the obtained effect size in your sample).
If you are confident that indeed, your operationalisation of attitude is valid in this sample, then from that point onwards, you treat the associated data series (e.g. the 0-s and 1-s representing to which condition each participant was randomized) as if it were attitude. You then interpret correlations of that data series with other data series as if they are indicative for the association of the attitude construct with whichever construct the other data series operationalise.
This last step - the shift where you start treating the data series that was generated by the application of an operationalisation in a sample (i.e. the quantifications of aspects of reality as produced by a measurement instrument, or the dummy codes representing whether a manipulation was applied as registered for a manipulation) as if it is representative of the associated construct, is a necessary evil in psychology. This step is only justifiable if the operationalisation is sufficiently valid.
For example, if an attitude manipulation in fact induced irritation as well as changing attitude, and a change is observed in a dependent variable (e.g. people eat less food), then the conclusion that increasing their attitude decreases food intake would be wrong.
Operationalisations are the operational equivalent of constructs. Measurement instruments have the same type of relationship to the constructs they operationalize as manipulations do. The only difference is that measurement instruments operationalise observation of a construct, and manipulations operationalise change of a construct.
Without manipulations, it would be impossible to study causal relationships between psychological variables. We need to be able to change one variable in order to then observe whether another variables is influenced by that first variable. The manipulation represents the independent variable in the same way the measurement instrument represents the dependent variable (and, in fact, in the same way the manipulation check represents the independent variable).
Behavior change principles cannot change behavior
Observable human behavior originates in the motor cortex. Activation patterns in the motor cortex originate elsewhere in the brain. In other words, the only way to change behavior (bar physicial coercion, which, as a behavior change intervention, is ethically frowned upon in many societies) is to target aspects of human psychology.
Behavior change interventions combine intervention components, which in turn contain one or more manipulations designed to target determinants of behavior in a behavior change context. Determinants of behavior are here defined as psychological constructs that contribute to activating the motor cortex such that the target behavior occurs. In other words, changing those determinants contributes to behavior change.
These manipulations of determinants are often called behavior change techniques (BCTs), but it is better to start eliminating the “behavior change” from these terms. True, the ultimate goal is behavior change: but this name implies that a BCT targets behavior. They cannot. There is no way to directly control the motor cortex through, for example, visual, aural, or tactile stimuli. Any change in behavior, necessarily, operates through changes in determinants.
Therefore, any deliberate attempt to change behavior requires first mapping the relevant determinants; and then identifying manipulations that are valid operationalisations of the important determinants. (Note that this was described in the Intervention Mapping protocol more or less from its original in the late nineties; though the terminology was perhaps a bit less based in methodology. The Intervention Mapping protocol, therefore, has already since the beginning contained theory- and evidence based linking of behavior change methods to determinants of behavior, to allow intervention designers to choose the methods most likely to yield an effect.)
Intervention mapping has a useful distinction between theoretical methods of behavior change and a method’s practical application. The theoretical methods of behavior change constitute the guidelines for operationalisation of the associated determinant (the target construct) as a manipulation. Implementing those guidelines (which of course is possible in a variety of ways) yields a practical application: a ‘physical’ product that can be used in the real world.
BCTs are in this sense ill-defined: they often conflate parts of, on the one hand, methods (the description of psychological processes and procedures to manipulate a target construct, i.e. operationalisation guidelines for developing or selecting a manipulation), with on the other hand, practical applications (characteristics of specific stimuli or procedural elements in a ‘physical product’).
However, to the degree that they can ultimately have any effect on behavior, any BCT or MBCT is a manipulation of the associated construct. Of course, some BCTs have relatively low validity; and all are necessarily aggregates of lower-level BCPs (see Crutzen & Peters, 2018).
And in some cases, people might apply such manipulations without knowing what they’re doing: in other words, without a clear theoretical rationale for which construct is, or which constructs are, targeted; and therefore, which parameters for effectiveness must be satisfied for the manipulation to be a valid operationalisation of the target construct(s).
Why determinants targeted by BCPs cannot be mediators
Because BCPs are operationalisations (manipulations) of constructs, measuring the determinants targeted by a BCP constitutes a manipulation check.
This does not answer the substantive question “does the effect of an independent variable on a dependent variables ‘work through’ changing an intermediate variable”.
Because the manipulation, i.e. the BCP (or BCT, or MBCT) is, in your design, what represents the ‘intermediate variable’, i.e., the determinant.
Of course, it is possible you don’t know which determinant your manipulation targets. Such a lack of psychological theory is regrettable but possible.
So your question might be, “which determinant(s) can this MBCT (or BCT, or BCP, etc) change?”
However, what you are doing as you are answering that question is studying the validity of an operationalisation. You’re not studying any substantive psychological research question.
Of course, this by no means means (hehe) that this is not a useful effort. In fact, given that paucity of evidence regarding the validity of BCTs, MBCTs etc for changing target constructs, I’d say that at the present stage of behavior change science, rigorous research into the validity of our measurements (and, actually manipulations, see pragmatic nihilism) is probably more useful scientifically, and ultimately practically, than most meta-analyses coding BCTs (harsh, but as we expained in the ‘as simple as possible’ paper, probably true).
But this urgent need for research into which BCTs, MBCTs etc influence which determinant, that doesn’t make such studies tests of mediation models.
Mediation requires three variables. A study where a manipulation is applied and two other variables are measured contains only two. The manipulation is an operationalisation of one or more target constructs. If no ‘mediation’ is found, this can mean that the manipulation is not a valid operationalisation of the targeted determinant(s). Which is a useful and worthwhile conclusion.
But without confidence in its validity, no manipulation can be useful for studying human psychology. We then simply don’t know which aspects of human psychology it operationalizes, and therefore, any changes observes in any other variables cannot be ascribed to any aspects of human psychology. Hence, we learn nothing except that probably something changes somewhere - a somewhat bleak outlook, and a stage we hopefully outgrew as a science.
A last note on mediation
Establishing that mediation exists constitutes learning an extremely strong, valuable lesson. It means:
that we know how to influence a predictor variable (we have a valid operationalisation of that variable that we can reproducibly apply);
that we know that that causes changes in the mediator; and finally,
that we know that causing that change in the mediator causes changes in the dependent variable.
These lessons, once learned, are powerful tools for manipulating reality, for example in crafting prevention messages or treatment protocols.
Unfortunately, studying mediation is a lot harder than studying bivariate causal effects. Which already is really hard - obtaining valid operationalisations of your constructs, be they measurement instruments or manipulations, is very hard. Especially if you look at the somewhat liberal way most theories define their contituent constructs.
As always, something that is very valuable is hard to obtain. Much has been written about why observational studies are practically useless to study causality (try to construct exhaustive DAGs for theoretical relationships between psychological constructs), and even when using experimental designs, drawing conclusions about mediation is extremely hard.
Most mediation studies do not actually study mediation. Many do not use experimental designs - and if they do use experimental designs, they often do not even manipulate the mediator. Because this was touched upon the Twitter thread, let me also explain why you need to manipulate the mediator.
Imagine you want to know whether the association between construct A and construct C is mediated by construct B. You design an experiment where you select/develop an operationalisation of construct A (both a manipulation and a measurement instrument) and of constructs B and C (measurement instruments).
You do power analyses, conduct your experiment with the few hundred or so participants you’d need to obtain halfway decent power for a realistic mediation effect, and your manipulation check shows that your manipulation of construct A was successful.
In addition, you obtain very promising effect sizes for change in the mediator and in the dependent variable. You run a mediation analysis and find the exact patterns that you hypothesized. All very significant, high effect sizes, you know what, I’ll even throw in very tight confidence intervals around those effect size estimates.
If you now conclude that you found evidence of mediation, you’d be wrong.
Why?
Because correlation does not imply causation.
Note: it is important to take this literally. The adage does not say that correlation does not ‘prove’ or even that it does not ‘suffice as evidence for’ causation. Correlation does not imply causality. Why does correlation not imply causality? Because only a teeny tiny fraction of all observable correlations represent a causal effect. If you see a correlation, the probability that it represents a causal effect is much, much, much lower than the probability that it does not represent a causal effect. Even consistent correlation is more often the consequence because both observed variables are caused by the same or related external variables, than because one happens to be a causal antecedent of the other.
That doesn’t suddenly stop being true in psychology or behavior change. If two things are associated, that’s probably not because one causes the other. And if two things occur sequentially in time, it’s still probably not because one caused the other: temporal sequence has practically no additional evidential value, because correlation does not imply causation.
So, back to our mediation model. We applied a manipulation that was a valid operationalisation of construct A, and therefore, we are confident that we caused a change in construct A.
EDIT: it was clearly too late yesterday evening; this part is messy and unclear. I’ve rewritten it here; old bit below in strikethrough and grey
We also observed a change in construct B and in construct C, and let’s assume we have a longitudinal design and that our measurement of constructs B and C took place at appropriate moments in time, we may be able to exclude the possibility that construct C was changed by construct A, and that it was construct C that changed construct B.
Still, no evidence of mediation. Mediation is the statement that A influences B, and (part of) that influence results in B influencing C.
However, we have no evidence that B influences C. Any change in C may be caused either by A directly (but delayed) or by any extraneous variable that is also influenced by A. In fact, because most observed correlations are not indicative of the exestance of a causal effect, such scenario’s are more likely.
Light at the end of the tunnel
There are a number of solutions, of course. I think the most important one is to not be hellbent on studying mediation until you have a firm grasp of your target constructs. You need to have very, very clear definitions of constructs A, B and C, so you know exactly which aspects of human psychology are a part of each constructs, and at least as important, which aspects are not. Then, you need to carefully develop and validate operationalisations that are very specific to the target constructs.
After all, any statement about mediation requires a strong experimental design. And to be valid, an experimental design requires you to be able to manipulate the independent variable without manipulating other variables. After all, you cannot attribute any change in the dependent variable to the independent variable if your manipulation also changed four other constructs. You may have a large effect size and a significant association, but without a valid design, you don’t know what these tell you about human psychology.
So, all all of these are problems we can account for. Strong experimental designs, that are used only after enough was learned about the target constructs to enable developing or selecting valid, robust operationalisations, and learning how quickly the target constructs change over time, enable finally piecing together whether mediation occurs. But this is a far cry from observational designs (be they cross-sectional or longitudinal) or experimental designs, even if the experimental designs utilize carefully crafted highly specific, valid operationalisations of the target constructs.
So, yes, Martin. The large majority of mediation studies can tell us nothing about mediation. It’s barely even anecdotal evidence. And then we haven’t even discussed the disastrous consequences of the combination of 1) the abominable power you generally have for mediation analyses and 2) publication bias . . .
But there is hope. We have all we need to start taking operationalisations more seriously, both measurement instruments and manipulations. As publication practices change and preregistration becomes the norm, hopefully sensation will become less of an incentive and more rigor will start to be rewarded. So there’s hope.
But determinants are only mediators if you study them in a causal link from one construct to another - not if you see whether they changed as a consequence of application of a BCT or MBCT. That’s called a manipulation check. Even if you don’t know for sure which constructs your manipulation operationalizes.
And apologies for all the words :-) But this has cost me about five hours, and it’s 22:00 at Friday evening, so I’ll edit when I use parts of this for articles 😬 Have a good weekend!