6,844
Views
7
CrossRef citations to date
0
Altmetric
Research Article

The Curious Case of the Cross-Sectional Correlation

Abstract

The cross-sectional correlation is frequently used to summarize psychological data, and can be considered the basis for many statistical techniques. However, the work of Peter Molenaar on ergodicity has raised concerns about the meaning and utility of this measure, especially when the interest is in discovering general laws that apply to (all) individuals. Through using Cattell’s databox and adopting a multilevel perspective, this paper provides a closer look at the cross-sectional correlation, with the goal to better understand its meaning when ergodicity is absent. An analytical expression is presented that shows the cross-sectional correlation is a function of the between-person correlation (based on person-specific means), and the within-person correlation (based on individuals’ temporal deviations from their person-specific means). Two curiosities related to this expression of the cross-sectional correlation are elaborated on, that is: a) the difference between the within-person correlation and the (average) person-specific correlation; and b) the unexpected scenarios that can arise because the cross-sectional correlation is a weighted sum rather than a weighted average of the between-person and within-person correlations. Seven specific examples are presented to illustrate various ways in which these two curiosities may combine; R code is provided, which allows researchers to investigate additional scenarios.

In 2004, Peter Molenaar published his manifesto on psychology as an idiographic science, in which he called into question the preoccupation with individual differences that had dominated psychological research for decades (Molenaar, Citation2004). At the core of this critique is the concept of ergodicity, which Molenaar borrowed from the field of thermodynamics and the time series literature (cf. Hamilton, Citation1994). For a psychological phenomenon to be ergodic, all individuals should be characterized by the same moments—including means, variances, and covariances—over time; if so, it becomes possible to generalize results obtained at the level of the population to the individual, and vice versa (Hamaker, Citation2012). Since the ergodicity conditions are extremely limiting and are therefore unlikely to hold in psychological research, Molenaar (Citation2004) argued that the common practice of using cross-sectional results to uncover general laws that are assumed to apply to each individual is unfounded (see also Cattell et al., Citation1947; Epstein, Citation1980; Grice, Citation2004; Hamaker et al., Citation2005; Lamiell, Citation1998).

To investigate ergodicity in psychological practice, Fisher et al. (Citation2018) recently performed a systematic comparison of group-based and individual-based results, using six empirical datasets with self-reported affect measures and physiological measures. Two of their findings are of particular interest here. First, there was considerable variation among individuals in their person-specific correlation. This already implies that ergodicity does not hold, and that the cross-sectional correlation cannot adequately represent each and every individual (cf. Hamaker, Citation2012). Second, the average person-specific correlation differed considerably from the cross-sectional correlation, at least in several datasets. The latter finding contradicts the intuition that cross-sectional results, if not representative of each and every individual, should at least form a reflection of “the average person” (cf. McCrae & John, Citation1992; van Borkulo et al., Citation2015).

If the cross-sectional correlation does not represent every person, nor the average person, the question arises as to what the meaning and utility of the cross-sectional correlation are. To evaluate that question, we need to know how the cross-sectional correlation and the (average) person-specific correlation are related, especially when ergodicity is absent. While there have been various attempts to relate these two correlations (Hamaker et al., Citation2005; Hsu et al., Citation2022; Schmitz, Citation2000), an exact mathematical expression is still lacking from the literature. The purpose of the current paper is therefore to establish the actual connection between the cross-sectional correlation and the average person-specific correlation.

This paper is organized as follows. In the first section, Cattell’s databox is used to present an intuitive account of the cross-sectional correlation, and the person-specific correlation. The second section is based on taking a multilevel perspective to derive an analytical expression for the cross-sectional correlation as a weighted sum of the correlation across individuals of their person-specific means (i.e., the between-person (BP) correlation), and the correlation across individuals between the occasion-specific within-person deviations of individuals from their person-specific means (i.e., the within-person (WP) correlation). While this expression has appeared in the literature before (e.g., Dansereau et al., Citation1984; Robinson, Citation1950; Williams, Citation1974), there are two curiosities related to it that have not been treated yet, that is: a) the WP correlation in the expression for the cross-sectional correlation is not necessarily the same as the average person-specific correlation, which we are actually interested in; and b) because the cross-sectional correlation is a weighted sum rather than a weighted average of the WP and BP correlations, the cross-sectional correlation will not necessarily fall in between these two correlations. These two curiosities are further illustrated in the fourth section, where seven specific scenarios are presented in detail. The paper ends with a discussion.

Two correlations from Cattell’s databox

The cross-sectional correlation is one of the most frequently used measures that psychologists use to describe the relation between two variables. In addition, it can be considered the backbone of many of our most popular statistical techniques, such as regression analysis, factor analysis, and path analysis.Footnote1 However, the usefulness of the cross-sectional correlation for gaining insight into processes that take place within individuals over time, has been questioned regularly for decades. It has been argued that the latter requires a within-person, and even a person-specific approach.

To obtain an initial understanding of the cross-sectional correlation and the person-specific correlation that Fisher et al. (Citation2018) compared, consider Cattell’s databox (e.g., Cattell, Citation1952). In an example is given based on just two variables X and Y (see also Hamaker & Ryan, Citation2019). Keeping the variables fixed, there are two directions in which we can take a slice from this databox. First, we can focus on multiple individuals at one time point, which is referred to as a cross-section. For this, the cross-sectional correlation between X and Y (also referred to as group-based, individual differences, or interindividual correlation), can be expresses as (1) ρt=cov(Xt,Yt)var(Xt)var(Yt).(1) When we assume stationarity, the subscript t can be dropped from this expression.

Figure 1. Example of Cattel’s databox, showing how data can be thought to come from three different dimensions: variables (here only two), persons, and occasions. It further shows that a cross-section of two variables consists of many individuals at one occasion, whereas a person-specific (N = 1) study consists of many occasions from a single person.

Figure 1. Example of Cattel’s databox, showing how data can be thought to come from three different dimensions: variables (here only two), persons, and occasions. It further shows that a cross-section of two variables consists of many individuals at one occasion, whereas a person-specific (N = 1) study consists of many occasions from a single person.

Alternatively, we can focus on one person and a lot of time points, in an N = 1 approach. The person-specific correlation (also referred to as the individual-based or intraindividual correlation) for person P = p can be expressed as (2) κp=cov(X,Y|P)var(X|P)var(Y|P).(2) This person-specific correlation may be different for each person, due to individual differences in one or both person-specific variances and/or the person-specific covariance. Hence, it is well capable of capturing idiosyncracies.

When ergodicity (and therefore stationarity; Hamilton, Citation1994) holds, the two slices are characterized by the exact same means, variances and covariances (and skewness and kurtosis); as a result, ρ=κp, that is, the two correlations will be exactly the same. However, when ergodicity is absent, means, variances, covariances and thus correlations, may differ. This raises the question as to how the cross-sectional correlation and the person-specific correlation are related then.

Decomposing the cross-sectional correlation from a multilevel perspective

To gain more insight in the cross-sectional correlation and how it relates to the person-specific correlation, it is helpful to take a multilevel perspective, in which the score on variable X for person p at time t can be represented by the person-specific mean E[X|P=p], and a temporal, occasion-specific within-person deviation from that person-specific mean, denoted as xW,pt, that is, (3) xpt=E[X|P=p]+xW,pt.(3) In the multilevel context, the first term on the right-hand side is referred to as the BP component as it only varies between persons, while the second term is referred to as the WP component as it varies within a person over time. When there is a second variable Y, a similar decomposition can be applied to that as well.

The person-specific correlation presented in the previous section is the correlation between the WP components xW,pt and yW,pt of a particular person across time. In contrast, the cross-sectional correlation is based on both the BP and the WP components of X and Y, and can be expressed as a weighted sum of the correlation between the BP components and a correlation based on the WP components. To see this, we begin with rewriting the cross-sectional covariance and variances, which are included in EquationEq. (1), and then we use these to get an alternative expression for the cross-sectional correlation.

The cross-sectional covariance of X and Y as the sum of two components

By using the law of total covariance,Footnote2 we can write (4) cov(X,Y)=cov(E[X|P],E[Y|P])+E[cov(X,Y|P)]=σBX,BY+σWX,WY,(4) where: a) σBX,BY=cov(E[X|P],E[Y|P]) is the covariance between the person-specific means on the two variables (i.e., based on the BP components); and b) σWX,WY=E[cov(X,Y|P)] is the average person-specific covariance between the variables (i.e., based on the WP components).

The cross-sectional variance of X as a sum of two components

Using the law of total variance,Footnote3 we can express the cross-sectional variance of X as (5) var(X)=var(E[X|P])+E[var(X|P)]=σBX2+σWX2,(5) where: a) σBX2=var(E[X|P]) is the variance of the person-specific means on X (i.e., based on the BP components); and b) σWX2=E[var(X|P)] is the average person-specific variance based on the individual deviations from the person-specific means (i.e., based on the WP components). The cross-sectional variance of variable Y can be rewritten in the same way, that is var(Y)=σBY2+σWY2.

Rewriting the cross-sectional correlation of X and Y

We can now rewrite the cross-sectional correlation as the weighted sum of two other correlations, one concerning the BP components, and another concerning the WP components. To this end, we start with plugging the expression for the covariance obtained in EquationEq. (4) into the expression for the cross-sectional correlation, to get (6) ρ=σBX,BY+σWX,WYσXσY=σBX,BYσXσY+σWX,WYσXσY.(6)

When we focus on the first term on the right-hand side of EquationEq. (6), and multiply this with σBXσBY/σBXσBY (where σBX and σBY are the square roots of the variances of the person-specific means of X and Y; see the first term on the right-hand side of EquationEq. (5)). This gives us σBX,BYσXσY=σBXσBYσXσY×σBX,BYσBXσBY=σBXσX×σBYσY×ρB. Similarly, we can rewrite the second term on the right-hand side of EquationEq. (6), by multiplying it with σWXσWY/σWXσWY (where σWX and σWY are the square roots of the average person-specific variances on X and Y; see the second term on the right-hand side of EquationEq. (5)). This gives us σWX,WYσXσY=σWXσWYσXσY×σWX,WYσWXσWY=σWXσX×σWYσY×ρW.

Substituting these expressions back into EquationEq. (6), we get (7) ρ=σBXσX×σBYσY×ρB+σWXσX×σWYσY×ρW=ηBXηBYρB+ηWXηWYρW,(7) where ηBX=σBX/σX,ηBY=σBY/σY,ηWX=σWX/σX, and ηWY=σWY/σY. Note that ηBX2=σBX2σX2 and ηBY2=σBY2σY2 represent the intraclass correlations for X and Y, that is, the proportions of total variance that can be attributed to stable differences between individuals in X and Y. Similarly, ηWX2=σWX2σX2=1σBX2σX2 and ηWY2=σWY2σY2=1σBY2σY2 represent the proportions of variance in X and Y that are not accounted for by stable differences between persons.

Hence, from EquationEq. (7) it is clear that the cross-sectional correlation ρ can be perceived of as a weighted sum of the BP correlation ρB (i.e., the correlation between the person-specific means), and the WP correlation ρW (i.e., the correlation based on individuals’ temporal within-person deviations from their person-specific means). The weights in this sum depend on the intraclass correlations of X and Y.Footnote4

EquationEquation (7) is known in the organizational literature as within and between analysis (WABA), presented by Dansereau et al. (Citation1984), who credit Robinson (Citation1950) for its origin. More recently, Hsu et al. (Citation2022) have presented the same expression in discussing the limitations of the cross-sectional correlation when the goal is to find the relation between enduring (trait-like) differences between individuals (i.e., ρB); they reference classical test theory literature on the effect of correlated errors as the source of this expression (Saccenti et al., Citation2020; Williams, Citation1974; Zimmerman & Williams, Citation1977).

Two curiosities regarding the cross-sectional correlation

While at first sight, the expression in EquationEq. (7) of the cross-sectional correlation as a weighted sum seems pretty straight-forward, there are two curiosities associated with it that are of particular interest to us here, that is: 1) the WP correlation ρW is not necessarily the same as the average person-specific correlation κ; and 2) the weights in EquationEq. (7) do not necessarily add up to 1, which may have unexpected consequences. Both aspects are elaborated on below. For reference, the various correlations (i.e., mathematical expressions and substantive descriptions) are presented in .

Table 1. Five correlations that can be used to describe the way X and Y are related.

The discrepancy between ρW and the average person-specific correlation κ

When considering the expression for the cross-sectional correlation in EquationEq. (7), it is probably tempting to assume that the WP correlation ρW represents the average person-specific correlation κ (that is, the average across individuals of the person-specific correlation κp presented in EquationEq. (2)). It is based on the average person-specific covariance and variances, and can be expressed as ρW=σWX,WYσWXσWY=E[cov(X,Y|P)]E[var(X|P)]E[var(Y|P)]. In contrast, however, the average person-specific correlation is obtained by taking the expectation of the person-specific correlation, that is, κ=E[cov(X,Y|P)var(X|P)var(Y|P)]. In the Appendix it is shown that the latter can be rewritten as κ=E[cov(X,Y|P)]E[1var(X|P)]E[1var(Y|P)]+E[cov(X,Y|P)]cov(1var(X|P),1var(Y|P))+cov(cov(X,Y|P),1var(X|P)var(Y|P)), and the difference between ρW and κ is further elaborated on there. The results (for the case where E[cov(X,Y|P)] is positive) can be summarized as: a) the first term will in general be larger than ρW (there are some particular situations where the two are exactly the same); b) the second term is probably positive; and c) the third term is probably negative or zero. The latter may thus counter the difference that the first and second term create; yet, it seems reasonable to state that in general we should expect ρW<κ when E[cov(X,Y|P)]>0 (and, when E[cov(X,Y|P)]<0, we should expect κ<ρW).

To see some of this in practice, we consider a scenario in which we have two subpopulations. All individuals in the first subpopulation are characterized by the same person-specific covariance cov(X,Y|P)=1, and person-specific variances var(X|P)=var(Y|P)=2.5; hence, all these individuals have κp=1/2.5=0.4. In the second subpopulation, again all individuals have the same person-specific covariance cov(X,Y|P)=1. Moreover, all persons in the second sub-population have the same person-specific variances for X and Y, that is var(X|P)=var(Y|P). For the latter, we consider different values, ranging from a very large value (i.e., var(X|P)=var(Y|P)=20) so that the person-specific correlation is about 0, to the smallest possible number given the covariance value (i.e., var(X|P)=var(Y|P)=1), so that the person-specific correlation is 1. The two correlations that characterizing the members of the two subpopulations are plotted in , against Q=1/var(X|P)var(Y|P) for the second subpopulation (which is actually identical to the correlation in the second subpopulation): The horizontal solid line represents the first subpopulation, whereas the increasing solid line represents the second subpopulation. In addition, the average person-specific correlation κ (represented by the straight dashed line) is shown; it falls exactly in between the two values that represent the κp’s of the individuals in the two subpopulations, because both subpopulation make up exactly half of the total population. Finally, also contains the WP correlation ρW (represented by the curved dotted line), based on dividing the average person-specific covariance (here 1) by the square roots of the average variances. We can see that ρW is identical to the average person-specific correlation κ, when the two subpopulations are identical (i.e., when there is ergodicity); for all other cases we get ρW<κ.

Figure 2. Illustration of how the average person-specific correlation κ (dashed line) falls in between the person-specific correlations that characterize the two subpopulations (i.e., κp, represented by the two solid lines), and how the within-person correlation ρW from the expression for the cross-sectional correlation (dotted line) deviates from this, depending on the person-specific variances. Q=1/var(X|P)var(Y|P) in the second subpopulation is varied by varying the person-specific variances of X and Y, while all else (i.e., person-specific covariance in second subpopulation, and person-specific variances and covariance in first subpopulation) are held constant.

Figure 2. Illustration of how the average person-specific correlation κ (dashed line) falls in between the person-specific correlations that characterize the two subpopulations (i.e., κp, represented by the two solid lines), and how the within-person correlation ρW from the expression for the cross-sectional correlation (dotted line) deviates from this, depending on the person-specific variances. Q=1/var(X|P)var(Y|P) in the second subpopulation is varied by varying the person-specific variances of X and Y, while all else (i.e., person-specific covariance in second subpopulation, and person-specific variances and covariance in first subpopulation) are held constant.

Clearly, the current set-up with only two subpopulations to create heterogeneity in the population is quite unrealistic; it seems much more reasonable to expect continuous individual differences in the person-specific variances and covariance of X and Y. Furthermore, while here we had var(X|P)=var(Y|P) or all persons, in reality, it is more likely that the person-specific variances are not (exactly) the same, even though they may be positively related (as individuals who vary more on X may also tend to vary more on Y). However, the current illustration shows that even with the exact same covariance for every one, and only two possible values for the person-specific variances, the difference between the average within-person correlation κ and the correlation of within-person deviations ρW can be quite substantial. This implies that even if the cross-sectional correlation is the same as the within-person correlation ρ=ρW (because there are no stable between-person differences for instance), this still does not imply it will be an adequate reflection of the average person-specific correlation κ.

The cross-sectional correlation as a weighted sum (not a weighted average)

Another misconception that may arise when considering the expression in EquationEq. (7), is the assumption that the cross-sectional correlation will always fall somewhere in between the BP and the WP correlation. For instance, in Schmitz (Citation2000) a formula is presented, in which the weights equal η2 and 1η2; this corresponds to the specific case where ηBX2=ηBY2, meaning that the two variables have the exact same intraclass correlations. In that case, the two weights add up to 1, and ρ will fall in between ρW and ρB. However, when the intraclass correlations differ from each other, the two weights do not add up to 1, as shown in . The diagonal from bottom left to top right shows the scenarios in which the two intraclass correlations are identical and the weights add up to 1; in all other scenarios the sum of the weights is less then one.

Figure 3. Heatmap showing the sum of the two weights (i.e., ηBXηBY+ηWXηWY as a function of the intraclass correlation (i.e., proportion of BP variance) in X (i.e., ηBX2) and Y (i.e., ηBY2). Lighter colors indicate higher values.

Figure 3. Heatmap showing the sum of the two weights (i.e., ηBXηBY+ηWXηWY as a function of the intraclass correlation (i.e., proportion of BP variance) in X (i.e., ηBX2) and Y (i.e., ηBY2). Lighter colors indicate higher values.

To explore the possible implications of this, consider the concrete examples in . Each panel is based on a specific combination of a BP correlation ρB that is represented as the dotted (blue) line, and a WP correlation ρW that is represented as the dashed (red) line. The solid (purple) lines represent the cross-sectional correlation ρ for different combinations of intraclass correlations ηBX2 and ηBY2. On the x-axis is the intraclass correlation for variable X, while the different shades of the solid lines represent different intraclass correlations for Y (where the darkest line represents ηBY2=0.1, and the lightest line represent ηBY2=0.9).

Figure 4. Solid (purple) lines represent the cross-sectional correlation plotted against intraclass correlation of X (i.e., ηBX2). Darkest line is for an intraclass correlation of Y (i.e., ηBY2) of 0.1, while the lightest line represents the scenario when it is 0.9. Dashed (red) line represents the WP correlation ρW; dotted (blue) line represents the BP correlation ρB.

Figure 4. Solid (purple) lines represent the cross-sectional correlation plotted against intraclass correlation of X (i.e., ηBX2). Darkest line is for an intraclass correlation of Y (i.e., ηBY2) of 0.1, while the lightest line represents the scenario when it is 0.9. Dashed (red) line represents the WP correlation ρW; dotted (blue) line represents the BP correlation ρB.

It can be seen that the solid lines representing the cross-sectional correlation, occasionally drop below both the dashed and dotted lines, which implies that in these cases the cross-sectional correlation ρ no longer lies between the BP correlation ρB and the WP correlation ρW. This is more likely to occur when: a) the intraclass correlations for X and Y are very different, such that the sum of the weights deviates (a lot) from one (i.e., lighter lines on the left, and darker lines on the right in every panel); b) the BP correlation and WP correlation are very similar, (e.g., compare the first and second panel of ); and c) the BP correlation and WP correlation are stronger (e.g., compare first and third panel of ). The most curious case is formed by the scenario where the BP correlation and the WP correlation are identical (see the fourth panel of ): When these two correlations are the same, this implies that the cross-sectional correlation will be closer to zero than the two (ρ<ρW=ρB), unless the intraclass correlations for X and Y are exactly the same.

Conclusion

Despite the simplicity of the expression for the cross-sectional correlation, its relation with the average person-specific correlation is quite complicated. Specifically, the first curiosity presented here is that even though the WP correlation is based on dividing the average person-specific covariance (i.e., σWX,WY=E[cov(X,Y|P)] by the square roots of the average person-specific variances (i.e., σWX=E[var(X|P)] and σWY=E[var(Y|P)]), the resulting correlation is not (necessarily) identical to the average person-specific correlation κ (see the Appendix for details). The second curiosity presented here is that the cross-sectional correlation may not fall in between the WP and BP correlations, but that it may lie closer to zero than the smallest of the two; this is especially likely to occur when the WP and BP correlations are identical.

Illustrations

To obtain more insight in the two curiosities and how they may combine, seven specific examples are considered and compared here. More details about the set-up, as well as annotated R-code that can be used to simulate data under various scenarios, can be found at the website with supporting materials.Footnote5

As before, heterogeneity in the within-person part is created by having two subpopulations with their own parameters, while within each subpopulation all individuals have the same person-specific parameters (i.e., person-specific variances var(X|P) and var(Y|P), and a person specific covariance cov(X,Y|P)). Furthermore, for simplicity var(X|P)=var(Y|P) within each subpopulation. Based on this, we can compute: a) the person-specific correlation κp that characterizes all the members of a subpopulation (i.e., we will have two values for κp’s); b) the average person-specific correlation κ (which is simply the average of the two κp’s, as both subpopulations make up exactly half of the population); c) the average person-specific variances σWX2=E[var(X|P)] and σWY2=E[var(Y|P)], and the average person-specific covariance σWX2=E[cov(X,Y|P)] (again, simply the average of the values that define the two subpopulations); and d) based on these, the WP correlation ρW=σWX,WY/σWXσWY. Subsequently, given specific intraclass correlations ηBX2 and ηBY2, and a BP correlation ρB, we can: e) compute the between-person variances σBX2 and σBY2 (using the average within-person variances computed above in σBX2=(σWX2ηBX2)/(1ηBX2) and σBY2=(σWY2ηBY2)/(1ηBY2); for a derivation of these expressions, see the supporting website); f) the between-person covariance (i.e., σBX,BY=σBXσBYρB); and g) the cross-sectional correlation ρ (using the WP correlation, BP correlation, and the intraclass correlations).

While all of these computations are based on analytical results, we can also simulate data once we know the person-specific variances and covariance, and the between-person variances and covariance. The R code provided through the supporting website allows the user to decide on the number of times points and persons (i.e., what part of Cattell’s databox we want to observe); as part of the illustration, data for 1,000 persons at 1,000 occasions were generated for each example presented below; to verify the analytical results, only data of the first time point were analyzed to get ρ̂, an estimate of the cross-sectional correlation. The parameter choices, analytically derived parameters, and the estimated cross-sectional correlation for all seven scenarios are presented in .

Table 2. Seven examples illustrating the relation between various correlations.

Example 1.

The first example is used to show that, even when the average person-specific correlation and the BP correlation are identical (i.e., κ=ρB), this does not imply that the cross-sectional correlation will be identical to them. Due to individual differences in the person-specific variances (either 1.25 or 5), while the person-specific covariance is fixed across individuals (here 1), we have ρW<κ. The intraclass correlations are not very different (0.4 versus 0.5), and the cross-sectional correlation falls in between the WP and BP correlation. Hence, we have ρW<ρ<ρB=κ.

Example 2.

The second example is similar to the first, but the intraclass correlations are much more different now (0.1 versus 0.9). Hence, the values of κ, ρW and ρB have not changed, but the cross-sectional correlation ρ is different from before, due to the different weighting of ρW and ρB. Specifically, the cross-sectional correlation no longer falls in between these two, but instead we have ρ<ρW<ρB=κ.

Example 3.

The purpose of the third example is to show that the difference between κ and ρW is related to individual differences in the person-specific variances. In this case, we have fixed person-specific variances, but there are individual differences in the person-specific covariance, so that there are still individual differences in the person-specific correlation κp. Yet, the lack of differences in the person-specific variances ensures that now we have κ=ρW. The BP correlation ρB was set to the same value. However, because the intraclass correlations differ from each other (0.2 versus 0.8), the cross-sectional correlation is still smaller than all of them, so we have ρ<ρW=ρB=κ.

Example 4.

In the fourth example, we go back to having individual differences in the person-specific variances, so we would again expect the average person-specific correlation and the WP correlation to differ from each other. However, the particular case here is such that all individuals are characterized by the exact same person-specific correlation (because the person-specific covariance varies with the person-specific variances), that is, we have κ=κp. This results in ρW=κ=κp. We set ρB to be identical to these. However, because there are different intraclass correlations (0.2 versus 0.8), the cross-sectional correlation lies closer to zero, that is ρ<ρW=ρB=κ=κp.

Example 5.

The fifth example is based on a scenario in which there are no stable between-person differences, that is, there are no differences across people in the person-specific means. This implies that the intraclass correlations for X and Y are both zero. For the within part, we use the same parameters values as in Examples 1 and 2, hence we have ρW<κ again. The BP correlation is not defined, as there cannot be a correlation when there are no differences at this level. As a result, the cross-sectional correlation is identical to the WP correlation. This can be summarized as ρ=ρW<κ.

Example 6.

The sixth example is included to underscore the difference between having no between-person differences in means (i.e., Example 5), versus having a correlation of zero between the person-specific means. The within part is the same as for Examples 1, 2 and 5, and the intraclass correlations are the same as in Example 1. With ρB=0, we get a cross-sectional correlation that falls in between the WP correlation and the BP correlation. Hence, we now have ρB<ρ<ρW<κ.

Example 7.

The final example is to show that even when the WP, BP, and cross-sectional correlations are identical (i.e., ρ=ρW=ρB), this still does not imply this correlation represents the average person-specific correlation. The within part in the example is the same as in Example 6 (and thus 1, 2, and 5), resulting in κ=0.5 and ρW=0.32. With ρB=ρW and identical intraclass correlations (here: 0.4), we have ρ=ρB=ρW=0.32, and thus ρ=ρB=ρW<κ.

Conclusion

Although the scenarios used here may be considered quite unrealistic (e.g., because the heterogeneity across individuals in the WP part results from having two subpopulations that themselves are homogenous, or because the intraclass correlations of X and Y are very different from each other in some examples), these numerical examples serve the specific purpose of illustrating how the two curiosities presented before may combine. The examples show that the relations between the cross-sectional correlation ρ and the average person-specific correlation κ is quite remote, and we should not try to infer one from the other. Determining how much these correlations tend to differ in practice is not possible based on the results presented here. That requires researchers to obtain data from both the persons dimensions and the time points dimension of Cattell’s databox, and compute the person-specific correlation per person and its average, and compare this to the (average) cross-sectional correlation (see Fisher et al., Citation2018).

Discussion

In this paper the relation between the cross-sectional correlation and the average person-specific correlation was investigated. It is clear that when ergodicity holds—meaning that all individuals have the same means, variances and covariance for X and Y—the cross-sectional correlation and the average person-specific correlation will be identical. In that case, we have: a) no individual differences in the person-specific correlation, and this is the same as the WP correlation, that is, κp=κ=ρW; and b) ρB will not exist and ηBX2=ηBY2=0 (as there are no stable between-person differences), so that EquationEq. (7) reduces to ρ=ρW. However, when there are mean differences between individuals, reflecting individual differences in central tendency over time, the cross-sectional correlation is a function of the correlation between these person-specific means (the BP correlation ρB), and the correlation between the momentary deviations from these person-specific means (the WP correlation ρW). Two curiosities related to this expression of the cross-sectional correlation were discussed and illustrated here.

First, it was shown that while the WP correlation is based in dividing the average person-specific covariance by the square roots of the average person-specific variances, this is not necessarily equal to the average person-specific correlation. This is an important finding, because it is the average person-specific correlation—rather than the WP correlation—that is considered most meaningful from a substantive point of view. For instance, Fisher et al. (Citation2018) considered the average person-specific correlation when they investigated the similarities and differences between cross-sectional and person-specific results in empirical data.

A second curiosity is that the cross-sectional correlation does not necessarily fall between the BP correlation (based on the person-specific means), and the WP correlation (regardless of whether the latter is the same or not as the average person-specific correlation). Whether or not this is the case depends on the combination of various factors. What may be the most surprising result here is that when the BP correlation is identical to the WP correlation, this actually implies that the cross-sectional correlation will fall closer to zero than these correlations (i.e., ρ<ρW=ρB in case of positive correlations), unless the intraclass correlations of X and Y are identical (ηBX2=ηBY2). This implies that even when all individuals are characterized by the same person-specific correlation, so that κp=κ=ρW, and the BP correlation is equal to this correlation as well, such that we could say this correlation reflects a general law, we will still get a different cross-sectional correlation unless the intraclass correlations are exactly equal to each other.

Clearly, if we have data from both the person and the occasion dimensions form Cattell’s databox, it is possible to estimate all of these correlations. However, when confronted with cross-sectional data, only ρ can be estimated, and the other correlations as well as the intraclass correlations remain unknown. In this context, Brandt and Morgan (Citation2022) state: “[…] the multilevel space is unobserved in a cross-sectional design. It is the dark matter of cross-sectional analysis, because it exists, cannot be observed from collected data, but nonetheless impacts what we observe” (p.3). From the expression in EquationEq. (7) it can be seen that the cross-sectional correlation depends on four unknowns, that is, ρB, ρW, ηBX2 and ηBY2; moreover, the relation between ρW and the average person-specific correlation κ is even more complex (see the Appendix), and depends on individual differences in the person-specific variances and how these are related and how these relate to the person-specific covariance. As a result the same cross-sectional correlation may arise from wildly different constellations of these various parameters, and it is not possible to back-engineer which of these scenarios gave rise to the observed cross-section.

When taking this one step further, we can think of the observed cross-sectional data as being part of various latent research designs, each of which may have a different “multilevel space”. That is, the cross-section could have been a wave in a longitudinal study that had a time span of for instance a few weeks up to multiple decades. These different time spans are associated with different within-between decomposition: If we have a time span of a month, the BP component represent the person’s mean of that month; in contrast, if we have measures that cover two years, the BP component represents the person’s mean over a 2-year period. These two person-specific means of person P = p do not have to be the same. This also has major consequences for the WP component, which is defined as the deviation between the measurement and the person-specific mean (see EquationEq. (3)). That is, when the month mean and the 2-year mean differ, a particular observation will have different temporal deviations from these two means, and it is even possible that one deviation is positive, whereas the other is negative. These issues render the within-between decomposition of cross-sectional data not only purely hypothetical, but also fundamentally unidentified.

While there are infinitely many combinations of BP, WP, and average person-specific correlations that may have given rise to a particular observed cross-sectional correlation, and it is impossible to determine which of these possible combinations represents the underlying truth, this does not imply we should consider the current treatise a mere intellectual exercise. It is a well established fact, based on decades of empirical research, that most (if not all) psychological measures contain both stable and transient aspects (including measurement error); in such cases the cross-sectional correlation is thus based on a mix of a BP and a WP correlation. Moreover, the recent surge of intensive longitudinal studies has shown that there are important individual differences in variability and strength of relations between variables, such that there may be a discrepancy between the WP correlation and the average person-specific correlation. Hence, the problem of how (not) to interpret the cross-sectional correlation is actually omnipresent in psychological research. Clearly, the cross-sectional correlation is a summary measure of how individual differences on X are related to individual differences on Y at a particular point in time; any other interpretation should be considered highly speculative in the absence of further evidence, and is perhaps best postponed until more knowledge is obtained about the various correlations in empirical practice (cf. Fisher et al., Citation2018).

In sum, while Peter Molenaar’s work on ergodicity already established that the cross-sectional correlation typically does not represent each and every individual’s person-specific correlation, the current paper added to this the analytical results that expose the exact relation between these two correlations. The results presented here show that it is impossible to determine to what extent the cross-sectional correlation deviates from the average person-specific correlation without additional information. Moreover, since standardized results from techniques like regression analysis, structural equation modeling, network analysis, and factor analysis are based on the correlation structure, the current concern is likely to be relevant for these practices as well (Brandt & Morgan, Citation2022).

Article information

Conflict of interest disclosures: The author signed a form for disclosure of potential conflicts of interest. The author did not report any financial or other conflicts of interest in relation to the work described.

Ethical principles: The author affirms having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Funding: This work was supported by Grant ERC-2019-COG-865468 from the European Research Council.

Role of the funders/sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Acknowledgments: The author would like to thank Dave Hessen for his comments on a prior version of this manuscript. The ideas and opinions expressed herein are those of the author alone, and endorsement by the author’s institutions or the European Research Council is not intended and should not be inferred.

Open Scholarship

This article has earned the Center for Open Science badges for Open Data and Open Materials through Open Practices Disclosure. The data and materials are openly accessible at https://ellenhamaker.github.io/cross-sectional-correlation/. To obtain the author's disclosure form, please contact the Editor.

Notes

1 While many of these techniques are based on the covariances, their standardized versions—and thus the standardized results from these analyses—are based on the correlations.

4 Noémi Schuurman developed an interactive app that allows the user to specify different intraclass correlations and BP and WP correlations to see what cross-sectional correlation results from this. See: https://noemikschuurman.shinyapps.io/withinbetweenapp/.

5 Supporting materials can be found at: https://ellenhamaker.github.io/cross-sectional-correlation/.

References

  • Brandt, M. J., & Morgan, G. S. (2022). Between-person methods provide limited insight about within-person belief systems. Journal of Personality and Social Psychology, 123(3), 621–635. https://doi.org/10.1037/pspp0000404
  • Cattell, R. B. (1952). The three basic factor-analytical research designs: Their interrelations and derivatives. Psychological Bulletin, 49(5), 499–520. https://doi.org/10.1037/h0054245
  • Cattell, R. B., Cattell, A. K. S., & Rhymer, R. D. (1947). P-technique demonstrated in determining psycho-physiological source traits in a normal individual. Psychometrika, 12(4), 267–288. https://doi.org/10.1007/BF02288941
  • Dansereau, F., Alutto, J. A., & Yammarino, F. J. (1984). Theory testing in organizational behavior: The varient approach. Prentice-Hall.
  • Epstein, S. (1980). The stability of behavior: 2. Implications for psychological research. American Psychologist, 35(9), 790–806. https://doi.org/10.1037/0003-066X.35.9.790
  • Fisher, A. J., Medaglia, J. D., & Jeronimus, B. F. (2018). Lack of group-to-individual generalizability is a threat to human subjects research. Proceedings of the National Academy of Sciences, 115(27), E6106–E6115. https://doi.org/10.1073/pnas.1711978115
  • Grice, J. W. (2004). Bridging the idiographic-nomothetic divide in ratings of self and others. Journal of Personality, 72(2), 203–241. https://doi.org/10.1111/j.0022-3506.2004.00261.x
  • Hamaker, E. L. (2012). Why researchers should think “within-person” a paradigmatic rationale. In M. R. Mehl & T. S. Conner (Eds.), Handbook of research methods for studying daily life (pp. 43–61). Guilford Publications.
  • Hamaker, E. L., Dolan, C. V., & Molenaar, P. C. M. (2005). Statistical modeling of the individual: Rationale and application of multivariate time series analysis. Multivariate Behavioral Research, 40(2), 207–233. https://doi.org/10.1207/s15327906mbr4002_3
  • Hamaker, E. L., & Ryan, O. (2019). A squared standard error is not a measure of individual differences. Proceedings of the National Academy of Sciences, 116(14), 6544–6545. https://doi.org/10.1073/pnas.1818033116
  • Hamilton, J. D. (1994). Time series analysis. Princeton University Press.
  • Hsu, S., Poldrack, R., Ram, N., & Wagner, A. D. (2022). Observed correlations from cross-sectional individual differences research reflect both between-person and within-person correlations.
  • Lamiell, J. T. (1998). ‘Nomothetic’ and ‘idiograhic’: Contrasting Windelband’s understanding with contemporary usage. Theory & Psychology, 8(1), 23–38. https://doi.org/10.1177/0959354398081002
  • McCrae, R. R., & John, O. P. (1992). An introduction to the Five-Factor model and its applications. Journal of Personality, 60(2), 175–215. [Database] https://doi.org/10.1111/j.1467-6494.1992.tb00970.x
  • Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology - this time forever. Measurement: Interdisciplinary Research and Perspectives, 2, 201–218.
  • Robinson, W. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15(3), 351–357. https://doi.org/10.2307/2087176
  • Saccenti, E., Hendriks, M. H. W. B., & Smilde, A. K. (2020). Corruption of the Pearson correlation coefficient by measurement error and its estimation, bias, and correction under different error models. Scientific Reports, 10(1), 438. https://doi.org/10.1038/s41598-019-57247-4
  • Schmitz, B. (2000). Auf der Suche nach dem verlorenen Individuum: Vier Theoreme zur Aggregation von Prozessen [In search of the lost individual: Four theorems regarding the aggregation of processes]. Psychologische Rundschau, 51(2), 83–92. https://doi.org/10.1026//0033-3042.51.2.83
  • van Borkulo, C., Boschloo, L., Borsboom, D., Penninx, B. W. J. H., Waldorp, L. J., & Schoevers, R. A. (2015). Association of symptom network structure and the course of depression. JAMA Psychiatry, 72(12), 1219–1226.
  • Williams, R. H. (1974). The effect of correlated errors of measurement on correlations among tests: A correlation for spearman’s correction for attenuation. The Journal of Experimental Education, 43(2), 63–65. https://doi.org/10.1080/00220973.1974.10806321
  • Zimmerman, D. W., & Williams, R. H. (1977). The theory of test validity and correlated errors of measurement. Journal of Mathematical Psychology, 16(2), 135–152. https://doi.org/10.1016/0022-2496(77)90063-3

Appendix

To see how the within-person correlation ρW and the average person-specific correlation κ differ from each other, we start with expressing the latter as κ=E[cov(X,Y|P)var(X|P)var(Y|P)]=E[cov(X,Y|P)1var(X|P)1var(Y|P)]. Hence, it is the expectation of the product of three random variables, that is E[ABC], where: A=cov(X,Y|P) is the person-specific covariance between X and Y, B=1/var(X|P) is the inverse of the person-specific standard deviation of X, and C=1/var(Y|P) is the inverse of the person-specific standard deviation of Y.

To rewrite the expectation of the product of three random variables, we use E[ABC]=E[AD] (i.e., D = BC), and then make use of the fact that cov(AD)=E[AD]E[A]E[D]. This implies we can write E[AD]=E[A]E[D]+cov(AD). Plugging BC back in for D, and using the same trick again for rewriting E[BC], we get E[ABC]=E[A]E[BC]+cov(A,BC)=E[A](E[B]E[C]+cov(B,C))+cov(A,BC)=E[A]E[B]E[C]+E[A]cov(B,C)+cov(A,BC).

Filling in the expressions for A, B, and C, we get the following expression for the average person-specific correlation (A1) κ=E[cov(X,Y|P)]E[1var(X|P)]E[1var(Y|P)]+E[cov(X,Y|P)]cov(1var(X|P),1var(Y|P))+cov(cov(X,Y|P),1var(X|P)var(Y|P)).(A1)

When comparing this to the expression of the within-person correlation, which we can write as (A2) ρW=E[cov(X,Y|P)]E[var(X|P)]E[var(Y|P)]=E[cov(X,Y|P)]1E[var(X|P)]1E[var(Y|P)](A2) it is clear that the average person-specific correlation is a more complicated expression. Specifically, the second and third term in EquationEq. (A1) depend on the way individual differences in person-specific variances and covariance are related to each other; in contrast, the expression of the within-person correlation shows there is no such information included in ρW.

When considering the various terms in EquationEq. (A1) more carefully, we can argue the following. Starting with the second term, we see that it contains the covariance between the inverse square roots of the person-specific variances on X and Y. If this covariance is zero, this term drops out. However, it seems reasonable to assume that the covariance is not zero: Individuals who vary more than others on X are also likely to vary more on Y than others. Hence, we may expect the covariance to be positive. In that case, even when the person-specific covariance cov(X,Y|P) is the same for all persons (i.e., it is a constant), this second term will be non-zero. Assuming cov(X,Y|P)>0, this term would be positive.

When considering the third term, we see it is the covariance between the person-specific covariance and the product of the inverse square roots of the person-specific variances. It is more difficult to convincingly reason in any direction for this term, but suppose that individuals who vary more on X and Y tend to also have larger covariances between X and Y; in that case, we would find that this third term becomes negative (as it would imply a negative covariance between the covariance and the product of the inverses of the square roots of the variances).

There are various scenarios in which the second and third term will be zero. Ergodicity—meaning all individuals would have the same person-specific variances and covariances—is one of them. Another, somewhat less restrictive scenario is when all individuals have the same person-specific covariance (so the third term drops out), and the person-specific variances are unrelated (so the second term becomes zero). Note that in this scenario the average person-specific correlation κ and the within-person correlation ρW will still not be the same.

The latter has to do with an important difference between ρW and the first term in EquationEq. (A1). Although both contain the same average person-specific covariance E[cov(X,Y|P)], in the expression for ρW we have {E[var(X|P)]}1/2 and {E[var(Y|P)]}1/2 (i.e., first the expectation is taken, then the inverse and square root), whereas in the expression for κ we have E[{var(X|P)}1/2] and E[{var(Y|P)}1/2] (i.e., square root and inverse are taken before taking the expectation). Based on Jensen’s inequality, which states that for a convex function, f(E[Z])E[f(Z)], we know that {E[var(X|P)]}1/2 is smaller or equal to {E[var(X|P)]}1/2. From this it follows that—if E[cov(X,Y|P)]>0—we have ρW is smaller or equal to the first term on the right-hand side of κ.

To summarize the relation between κ and ρW in case of a positive average person-specific covariance (i.e., E[cov(X,Y|P)]>0), we can state that: a) the first term in the expression of κ will be larger than ρW; b) the second term in κ will most likely be positive; and c) the third term is probably more likely to be negative or zero. Taken together this suggests that we may expect κ>ρW>0, although this is not necessarily always the case. When the average person-specific covariance is negative (i.e., E[cov(X,Y|P)]<0), we have: a) the first term in the expression of κ will be more negative than ρW; b) the second term in κ will most likely be negative; and c) the third term is probably more likely to be positive or zero. Hence, in that case we most likely have κ<ρW<0.

There are two specific scenarios when κ=ρW. When ergodicity holds, the second and third term in EquationEq. (A1) drop out, and the first term becomes identical to ρW; here we have ρW=κ=κp for all persons. A slightly less restrictive scenario is when all individuals have the same person-specific variances; again, this would imply that the second and third term of EquationEq. (A1) drop out, and the first term equals ρW, so that κ=ρW; yet, individuals could still have different person-specific correlations κp, due to individual differences in the person-specific covariance E[cov(X,Y|P)]. A third scenario that was found as one of the examples in the current paper, is when the person-specific correlations are the same, that is, κp=κ for all p, even though the person-specific variances and covariances differed; in this case the second and third term do not cancel out, but somehow exactly cancel out the difference between the first term in EquationEq. (A1) and the WP correlation ρW.