184
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Trusting: Alone and together

ORCID Icon, &
Received 03 Mar 2023, Accepted 30 Jan 2024, Published online: 23 May 2024

ABSTRACT

We study the problem of an agent continuously faced with the decision of placing or not placing trust in an institution. The agent makes use of Bayesian learning in order to estimate the institution’s true trustworthiness and makes the decision to place trust based on myopic rationality. Using elements from random walk theory, we explicitly derive the probability that such an agent ceases placing trust at some point in the relationship, as well as the expected time spent placing trust conditioned on their discontinuation thereof. We then continue by modeling two truster agents, each in their own relationship to the institution. We consider two natural models of communication between them. In the first (“observable rewards”) agents disclose their experiences with the institution with one another, while in the second (“observable actions”) agents merely witness the actions of their neighbor, i.e. placing or not placing trust. Under the same assumptions as in the single agent case, we describe the evolution of the beliefs of agents under these two different communication models. Both the probability of ceasing to place trust and the expected time in the system elude explicit expressions, despite there being only two agents. We therefore conduct a simulation study in order to compare the effect of the different kinds of communication on the trust dynamics. We find that a pair of agents in both communication models has a greater chance of learning the true trustworthiness of an institution than a single agent. Communication between agents promotes the formation of long-term trust with a trustworthy institution as well as the timely exit from a trust relationship with an untrustworthy institution. Contrary to what one might expect, we find that having less information (observing each other’s actions instead of experiences) can sometimes be beneficial to the agents.

1. Introduction

In this paper we are interested in the process by which individuals decide to place their trust in an institution. The importance of trust and distrust in governments around the world was highlighted by the COVID-19 pandemic during which the distrust in government held by a group resulted in fewer vaccinations among individuals of that group (Bajos et al., Citation2022). Similarly, distrust in science is strongly correlated with vaccine hesitancy (Lazarus et al., Citation2022). The prevalence of fake news during the COVID-19 infodemic (World Health Organization, Citation2020) points toward peer-to-peer communication as a culprit for widespread distrust. Although these results were gathered in relation to the COVID-19 pandemic and vaccine hesitancy, one expects similar dynamics to apply to a whole host of different social phenomena. While the personal experience of an individual with the institution is important, it seems that the interactions with any individual’s peers play a crucial role as well. We therefore study the dynamics of individual-to-institution trust in the context of peer-to-peer relationships between the individuals placing the trust.

A trust situation is a situation in which individuals are required to take a “risk” in order to observe how trustworthy the other party is (Buskens, Citation2002, Ch.1). In some cases this risk concerns the placement of resources at the disposal of the other. Relationships to institutions are characterized by trust: the truster in such a relationship typically builds an intuition of how trustworthy the institution is. For example, we could consider the relationship between an individual and a social media company. The individual has thus placed their data (resources) in the hands of the company, and in return can post content, and has access to the posts of others. The level of trustworthiness the individual observes is determined by their daily experience using the social network, manifested in the (lack of) news that the social media company has been selling their information to third party advertising companies. As an alternative example, consider the decision of whether or not to get vaccinated for a virus. For those individuals who believe scientific consensus to have their best interest at heart, this decision may not seem like something requiring trust at all. For other individuals who are skeptical of the institution science and/or healthcare governance, it may seem like a risk to their health and possibly their personal information to get vaccinated at a government vaccine station.

This type of examples has motivated us to phrase the problem of deciding to place trust as a learning problem. The idea that agents learn about the trustworthiness of their interaction partner is also supported by sociological theory. The chapter by Buskens and Raub (Buskens & Raub, Citation2002) provides a good overview of trust as a learning problem, as well as its own experimental evidence thereof. Furthermore, individuals are rarely isolated when faced with the decision of whether or not to place trust. Instead, human actions are embedded in a network which is likely to play a role in the decision making behavior of its individuals. We therefore consider the effect of peer-to-peer communication on the dynamics of the trust problem, which is also supported in the empirical literature (cf (Ozdemir et al., Citation2020; Pechmann & Knight, Citation2002). In particular Wang and Yu (Wang & Yu, Citation2017) find evidence that this influence acts both when experiences are communicated as well as when agents simply observe another’s behavior. We restrict our study to two agents as this showcases the essence of the underlying mechanism in the agent-to-agent communication. This allows us to construct the model with exact Bayesian learning.Footnote1 Under basic assumptions of myopicFootnote2 rationality, instead of exogenously imposing model assumptions on the effect of a signal sent from one agent to the other. This approach leads to delicate intricacies already in the two agent case, stemming from agents interpreting their neighbors actions whilst knowing that these actions were influenced by the actions they themselves have taken thus far. As a result of this mutual influence, in the two agents setup the evaluation of key quantities in our model eludes simple and explicit expressions. In fact, evaluating them requires many numerical integrations, which has led us to use a cluster of a large computing facility in our numerical experiments.

1.1. Related literature

Our investigation connects to two streams of literature: that of social network learning and that of social network trust. We proceed by providing an admittedly non-exhaustive, brief account of the related literature. The work relevant to our investigation spans various research disciplines, each with their own approach and set of “reasonable assumptions.” The fields in question contain, but are not limited to, economics, theoretical sociology, socio-physics and machine learning. Beyond modeling of these dynamics, the empirical investigations of network influence on trust even extend to the field of marketing.

1.1.1. Social network learning

Social network learning concerns agents who try to optimize an unknown objective function by choosing an action, and that are learning via private signals and/or signals from other agents. Under various modeling decisions, the interest is in the speed at which a group of agents learns the best course of action, as well as whether or not under assumptions of rationality, the group may take a sub-optimal action often as a result of social influence. The modeling decisions relate to the degree of rationality in learning, the communication between the agents, and whether or not the private signals of the agents are conditional on the actions they take. summarizes the modeling decisions of the papers (Bala & Goyal, Citation1998; Harel et al., Citation2021; Huang et al., Citation2021; Molavi et al., Citation2018), all of them considering a population of agents in the social network learning framework. The second last column headed “Communication” refers to what the agents share with one another. This could be their private learning signal per round, their belief distribution, or simply the actions they take. The last column headed “Signals” refers to whether the reception of private signals by the agents is independent of or conditional on the actions the agents take.

Table 1. Modelling decisions in selected social network learning literature.

The agents in the paper of Bala and Goyal (Bala & Goyal, Citation1998) are myopic, and in addition they do not infer anything about the outcomes of the neighbors of their own neighbors based on the actions taken by their immediate neighbors. The authors find that in connected networks the agents’ behavior converges asymptotically. Molavi et al., (Citation2018); Harel et al., (Citation2021); Huang et al., (Citation2021) all consider models in which the private signals received by the agents are independent of the actions they take. Within that context, (Molavi et al., Citation2018) show how imperfect-recall in the context of Bayesian learning relates to the seminal model by (DeGroot, Citation1974). In models of a fully connected population of agents, Harel et al., (Citation2021); Huang et al., (Citation2021) relate the learning rate of the model with communication restricted to actions to a reference model where communication between agents is open to signals and beliefs.

Departing from the network setting, (Correa et al., Citation2020) and Fu and Le Riche (Fu & Le Riche, Citation2021) consider a model in which sequentially arriving agents from a population choose to buy a product or not, based on a Bayesian belief on the product’s quality that has been built on the observations of the preceding agents. Correa et al., (Citation2020) compute the probability of incomplete learning which occurs when the agents falsely believe the product’s quality to be low. An important difference between Correa et al., (Citation2020) and our setup is that the agents in Correa et al., (Citation2020) know that the quality of the product comes from a set of two known values. This mean that in Correa et al., (Citation2020) the agents are only tasked with identifying which of the two values the quality takes, whereas in our paper the relevant parameter (in the sequel refrred to as the trustworthiness) may take any value in the unit interval. Fu and Le Riche (Fu & Le Riche, Citation2021) incorporate a similar learning problem into an endogenous growth market model in which the agents compare a new product of unknown quality to an old one of known quality, leading to the outcome that there are attainable equilibria in which the product’s true quality remains unknown.

There also exists a body of literature focused on conditions which ensure social learning. In these papers agents typically receive only one signal and take sequential actions which are observed by some or all of the remaining agents. In the case of Banerjee and Fudenberg (Banerjee & Fudenberg, Citation2004), the observation structure takes the form of a representative sample while the population is a continuum. Acemoglu et al., (Citation2011) impose a network structure to determine which actions are observed by agents. For an overview of models in which agents are represented by a fully connected graph, the interested reader is referred to (Bikhchandani et al., Citation1998).

To the best of our knowledge, the following relevant setup has not been studied in the social network learning literature: A model in which agents act myopically rational, observe their neighbors actions or signals, and have private signals which are conditional on the actions taken. This kind of model also relates to the social network trust literature, considering that trust relationships require a dependency between signals and actions.

1.1.2. Social network trust

Social network trust typically concerns the behavior of pairs of truster and trustee agents playing an iterated trust game. The interest here lies in the likelihood of trust breaking down and describing the rational strategies of the truster and the trustee. Particularly relevant to our work are studies in which there is a population of trustworthy trustee agents who always honor trust, and opportunistic trustee agents who play strategically. In such models, it is a natural extension to have agents learn about the proportion of trustworthy trustee agents.

Important examples of this literature stream have been presented in (Bower et al., Citation1996; Buskens, Citation2003); (Frey et al., Citation2015). Bower et al., (Citation1996) describe equilibrium strategies for a sequence of 2-round trust games in which a new pair of truster and trustee agents is paired to play two rounds of the trust game. Additionally to learning about the true portion of trustworthy trustees, there is learning from the first round to the second round within a match-up between truster and trustee agent. Buskens (Buskens, Citation2003) extends this analysis by considering a pair of truster agents that are both in a relationship to the same trustee. The main finding is that trust placement and honoring (in equilibrium strategies) increase with the probability of sharing information between truster agents, only if both agents are sharing information with a high probability. These findings have been corroborated by experimental work by (Buskens et al., Citation2010). Frey et al., (Citation2015) extend the theoretical work further by letting the link between truster agents be bought at a cost. The authors determine which game parameters include and exclude investments in such a connection for equilibrium strategies.

In contrast to this stream of literature, we consider a trustee agent such as an institution whose behavior is not modeled strategically, and who interacts with more than one truster agents. Following e.g. (Buskens, Citation2003; Buskens et al., Citation2010; Frey et al., Citation2015), one trustee interacting with two trusters is a natural setting to consider.

1.1.3. Relevance and perspective

In this section we elaborate on the relevance of the two different streams of literature on the model we present. Thematically we are based in the social network trust literature, while methodologically our approach bears resemblance to those used in social network learning. We first present the ideas in both streams and subsequently how our work fits into this landscape.

The social network trust literature offers detailed descriptions of the interactions between strategic trusters and one of two types of trustees, e.g. “friendly” trustees who always honor trust and “strategic” trustees who try to “fool” the truster in order to be able to abuse trust at some point (cf (Bower et al., Citation1996; Buskens, Citation2003; Frey et al., Citation2015). The proportion of “friendly” trustees may be known (as in (Buskens, Citation2003; Frey et al., Citation2015)) or unknown (as in (Bower et al., Citation1996).

The trusters are learning about what type of trustee they are interacting with, and only sometimes also learning about the proportion of “friendly” trustees. As soon as a trustee abuses trust once in the peer-to-peer setting, they reveal that they are not to be trusted. A prevalent example in this literature is that of people buying a second hand car (cf. Buskens [4, Ch.6]) from a loose acquaintance or via an online peer-to-peer service, or that of hiring an informal house sitter without a contract, both situations in which the truster and the trustee are peers in some way.

In the social network learning literature, there is an environment which creates an ordering on the actions the agents may take and emits a signal. The agents take actions and receive a signal. They use the signal to update their belief about the environment with the goal of taking the best action. It is social learning in the sense that the agents communicate with one another about their actions and/or signals. The modelers in turn are interested in the probability of learning the best action as a group (cf (Bala & Goyal, Citation1998; Correa et al., Citation2020; Fu & Le Riche, Citation2021) and the speed of convergence to this best action under different forms of communication (cf (Harel et al., Citation2021; Huang et al., Citation2021). The social network learning results are applicable to situations in which agents are consistently updating their belief and trying to take optimal actions. One can think of prediction in markets (and setting the corresponding price to maximize profit), adoption of opinions (trying to fit in the group), and the dissemination of information (keeping up to date with the latest information).

We are motivated by the question of trust in institutions and the influence of peer-to-peer communication thereupon. Thus the social network trust literature paints the thematic backdrop for our work. We are interested in the event that trust is lost, primarily in the “asymmetric” case of trusters interacting with institutions. In this setting it is less natural to incorporate the strategic behavior of the trustee: they are not actively taking part in the interactions, but rather “passively” providing a service. Instead of modeling trustee behavior as strategic, we let them simply draw an action from a distribution.

We investigated the social network learning literature because of the technical similarity between our model and the models found therein. Our goal was to look for work that handles our setting (possibly under different nomenclature). In the process we observe that our model also fills a gap in the social network learning literature. Furthermore, we are encouraged in our decision to compare the effects of different forms of communication on outcomes. In the social network trust literature, the agents are assumed to always communicate their experiences or their belief distribution fully. One can argue, though, that communication of experiences does not necessarily provide the right perspective: typically actions are readily observable, while the internal belief or personal experience may not always be shared. This has motivated us to compare the two.

The resulting modeling framework naturally applies to the setting we are interested in: individuals trusting institutions in the context of peer-to-peer influence.

1.2. Contribution

Our model aims to fill the above mentioned gaps in the literature. In order to model the effect of peer-to-peer communication on the dynamics of a trust problem between an individual and an institution, we draw from both streams of literature. Learning signals are dependent on the actions taken, following the social network trust literature. We follow the social network learning literature by implementing myopically rational decisions and a Bayesian learning procedure. We study two of the communication models between agents seen in both streams related literature. In the first (“observable rewards”) agents disclose their experience with the institution with one another, while in the second (“observable actions”) agents merely witness the actions of their neighbor, i.e., placing or not placing trust. In both models of communication, we describe rational usage of the information in updating beliefs. The extent of the rationality plays a role as a benchmark rather than a description of actual human behavior. The computations involved become complicated quickly, but provide a useful indication of perfectly rational information usage in terms of belief updating.

We impose no assumptions on the motivations of the trustee, who simply acts honorably at some probability θ(0,1), thus generalising the setting where this probability takes one of two values as in Correa et al., (Citation2020). We are interested in a setup in which truster agents interact with institutions whose behavior cannot be modeled strategically at the level of individual interactions, unlike Bower et al., (Citation1996), Buskens, (Citation2003), Frey et al., (Citation2015) and Kolb & Madsen, (Citation2022). We also consider a more general version of the communication seen in Buskens, (Citation2003) Frey et al., (Citation2015), in which agents have access to their neighbors’ actions and the outcomes thereof.

We describe the information usage and myopic decision making without depending on signals independent of the actions taken as seen in Harel et al., (Citation2021;), Huang et al., (Citation2021), Molavi et al., (Citation2018). The asymmetry of the actions, natural to the trust problem, means that incomplete learning is possible. Thus we pay attention to the probability of convergence to the truth rather than only the rate thereof.

In our work we analyze the dynamics of the single agent case using techniques from the field of random walk theory. Such analytic techniques, however, do not extend to two agent models, so that we analyze these relying on Monte Carlo simulation. As the numerical integrations required for agents to interpret each others actions take a prohibitive amount of computation time on personal computing machines, we have to use the Lisa cluster of the computing facility SURFsara.

We observe that two agents with communication between them tend to make the “correct” decision sooner. In other words: typically, sample paths in which the relationship helps the agents outweigh the sample paths in which “bad luck” for one agent implies “bad luck” for both due to the communication between them.

Our experiments reveal that the observable rewards model is not always “better” than the observable actions model. Which mode is most helpful to the agents depends on whether one is interested in making the correct decision in the long run or in being sure to end a relationship with an untrustworthy institution as quickly as possible. Moreover, we identify a parameter setting in which the probability of quitting is lower in the observable actions model than in the observable rewards model. This means that, contrary to what one might expect, having less information available might be beneficial to the agents, and there is no monotone ordering between the two models.

1.3. Organisation of paper

In §2 we describe basic elements of the model along with relevant interpretations. Thereafter, in §3 the model is described further and analyzed in the case of a single agent (with proofs being provided in Appendix A). Here we pay special attention to subcases which allow for analytic results regarding the probabilities of such a trust relationship ceasing. In §4 we formulate the two models for two agents in the trust relationship with the same institution: observable rewards and observable actions. In §5 we discuss the experimental setup used to investigate the two agent model and in §6 we discuss the results of this experimentation. We conclude this paper with a discussion on the implications of the results and their relevance to the present literature in §7.

2. Model for a single agent

In this section we present the model for a single agent, which forms the core of the two agents models discussed in later sections. We consider the situation in which an agent has repeated opportunities to place trust in a institution. The institution’s behavior is modeled by a single parameter ϑ[0,1], the true trustworthiness, defined as the probability at which trust is honoured (so that its complement 1ϑ is the probability that trust is abused). If trust is not placed, then the institution has no action to take. This behavior can be interpreted as the efficacy of the institution honoring trust, implicitly assuming that this is what they are attempting in each round. Note that we acknowledge the high level of abstraction taken in regard to the institution and that interest is mainly in the agent’s behavior in such a situation. In each round tN the agent chooses an action from the action set A={0,1} in which At=1 indicates that the agent places trust in round t while At=0 indicates that the agent quits the trust relationship. We define the random variable Xt, ∀tN, indicating whether the institution’s action in round t is (would be) that of honouring or abusing any trust that may or may not have been placed:

(1) Xt=1,withprobabilityϑ0,withprobability1ϑ.(1)

The random variables {Xt:tN} are i.i.d., in particular independent of the agent’s actions. Importantly, the agent only observes Xt in rounds when At=1. If trust is honoured the agent gains utility r (reward), while if trust is abused the utility gain is c (cost). We assume that c and r are positive integers; note that if r,cQ we simply multiply both by the product of their denominators to get integers. The utility for the agent placing trust in the institution is rXt1{At=1}c(1Xt1{At=1}), so that the expected utility is c(1ϑ). As is widely adopted in the learning literature (see, e.g., (Sebenius & Geanakoplos, Citation1983; Parikh & Krasucki, Citation1990; Bala & Goyal, Citation1998; Keppo et al., Citation2008; Harel et al., Citation2021)), we let the agent act with myopic rationality. This means that in every round t they only consider the expected utility of the immediate action, and they do not take into account the possible utility of actions taken in rounds t+1,t+2, or the utility of information gained by taking action At=1. The agent places trust if they believe the utility to have a nonnegative expected value. Furthermore the agent starts with a Beta distributed prior belief P0 with parameters α, βN, such that they initially believe the probability density of ϑ is given by

(2) P0(θ)=B(θ;α,β):=θα1(1θ)β101yα1(1y)β1dy,θ[0,1].(2)

The initial estimate of the expectation of θ is ϑˆ0=E[B(α,β)]=α/(α+β). As more information becomes available (trust is placed and subsequently honored or abused during rounds t>0) the agent updates this belief distribution in a Bayesian fashion. Let

(3) Sˆt=s=1tXs1{As=1},(3)

be the number of times that the agent observes that trust was honored until time t. Similarly let

(4) Fˆt=s=1t(1Xs)1{As=1},(4)

be the number of times that the agent observes that trust was abused until time t. The belief distribution held by the agent at the end of round t is then found by applying Bayes’ rule, so as to obtain

(5) Pt(θ)=θSˆt(1θ)FˆtP0(θ)01ySˆt(1y)FˆtP0(y)dy,θ[0,1].(5)

Denoting the estimated trustworthiness at time t by

(6) ϑˆt=EθPt(θ)[θ]=α+Sˆtα+β+Sˆt+Fˆt,∀tN,(6)

then for t=1,2, we have:

(7) At=1,ifrϑˆnc(1ϑˆn)0,∀n{0,1,,t1},0,otherwise.(7)

2.1. Quantities of interest

We are interested in the agent’s quitting which happens when they stop placing trust: {At=0}. The random variable τ denotes the number of rounds in which trust was placed until the first `do not place trust’ action:

(8) τ:=inf{tN{}:rϑˆtc(1ϑˆt)<0}.(8)

By the definition of τ in (8) and At in (7), we have

(9) At=1,∀tτandAt=0,t>τ.(9)

We thus note that ϑˆt=ϑˆτ for all t>τ, which arises quite naturally from the model dynamics considering that once the agent has taken the action to not place trust, they also do not get to observe the outcome of the action and thus do not adjust their estimate. This links the agent’s actions to the estimate and vice-versa. An institution can have a true trustworthiness ϑ such that c(1ϑ)>0, implying that an agent aware of the true value of ϑ would place trust forever. It is also possible in such cases that the agent’s estimate ϑˆ continues to adhere to the condition in (7) which means that they never lose trust. In such cases quitting is not a given and a particularly interesting probability to study is that of the event {τ<}. We denote this by

(10) pquit:=P(τ<).(10)

Furthermore, we are interested in the expected time spent in the system before quitting conditioned on quitting:

(11) q:=E[τ|τ<].(11)

We now recapitulate the dynamics with reference to the graphical representation in . The two elements determining the agent’s belief distribution at the end of time t are their prior P0 as well as the history of their interaction contained in the random variables Sˆt and Fˆt. Together these result in a trustworthiness estimation of the institution ϑˆt by which the agent makes the choice to either place trust or to quit the trust relationship in round t+1. If the agent places trust (thus continuing the process) there is a response from the institution of either honouring the trust (at probability ϑ) or abusing the trust (at probability 1ϑ). Note that all of the model’s randomness stems from the response of the institution.

Figure 1. (a) the single agent learner model illustrated conceptually and (b) two example paths of the model dynamics in the random walk interpretation. B(ϑ) denoting the Bernoulli distribution with parameter ϑ.

Figure 1. (a) the single agent learner model illustrated conceptually and (b) two example paths of the model dynamics in the random walk interpretation. B(ϑ) denoting the Bernoulli distribution with parameter ϑ.

3. Trusting alone

In this section we analyze the dynamics of the single agent model described in §2. Our interest lies in the probability of quitting pquit and the expected time to quitting q (conditioned on this occurrence). We consider the dynamics that unfold between a single agent who periodically places (or doesn’t place) trust in the institution. We start this section in §3.1 with a re-interpretation of the model in terms of a random walk with an absorbing barrier and (potentially asymmetric) step sizes as well as (potentially asymmetric) step probabilities. Subsequently in §3.2 we analyze this random walk model for a number of special cases, focusing on the evaluation of the metrics pquit and q.

3.1. Model re-interpretation

We interpret the model described in §2 as a 1-dimensional random walk with step sizes r and +c at probabilities ϑ(0,1) and 1ϑ respectively. For an agent holding an initial Beta belief distribution with parameters α and β the decision criteria becomes

(12) rα+Sˆtα+β+Sˆt+Fˆtc1α+Sˆtα+β+Sˆt+Fˆt0.(12)

By rearrangement we find that in order to place trust the agent needs

(13) cFˆtrSˆt,(13)

which we interpret as a random walk

(14) Zt:=cFˆtrSˆt,fort=0,1,,(14)

starting at Z0=0, and having an absorbing barrier at the “critical” position

(15) ucrit(α,β,c,r):=+1.(15)

Note that different values of α and β correspond to the same behavior in terms of pquit and qquit as long as the value of ucrit is the same. The additional unit in (15) is due to the decision criteria to place trust including equality (the expected reward must be zero or more) while the formulation of an absorbing barrier describes the first position for which trust is not placed. Note that all the influence of the prior is contained in ucrit. This also allows us to reformulate the probability of quitting as

(16) pquitpquit(α,β,c,r,ϑ)=P(t:Ztucrit),(16)

the probability of the random walking reaching the absorbing barrier. For ease of reference we refer to c/(c+r) as θcrit because an equivalent condition to Ztucrit is that ϑˆ<θcrit. For illustrative purposes we provide two sample paths that such a random walk might take in .

Example 1

(Random walk interpretation). Consider the case with an absorbing barrier at ucrit=5 arising from parameter values c=2, r=1, α=8 and β=2. We interpret this as having a critical trustworthiness θcrit=2/3=0.67. The first such path (depicted as a solid arrows) hits the absorbing barrier at t=9, while the second path (depicted as dotted arrows) does not hit the absorbing barrier in the time steps shown and so has τ>10. The values of Sˆt and Zt are shown in along with the respective estimated trustworthiness ϑˆt.

Table 2. Sample paths and agent beliefs as random walks.

Regarding the probability of quitting pquit, we consider first the case where ϑ<c/(c+r). Note that in this case the actual expected utility of placing trust is negative and that the process Zt has a drift toward the absorbing barrier, meaning that this will be reached eventually with probability 1.

Lemma 1

(Guaranteed quitting). If ϑ<c/(c+r), then pquit=1, and τ<.

The proof (given in Appendix A) heavily relies on the law of large numbers. The rest of our investigation in this section takes place in the (more interesting) case where ϑ>θcrit. This case is particularly relevant as it represents cases in which the optimal action for the agent would be to place trust indefinitely yet they do not necessarily do so. The random walk is investigated for general absorbing barriers u0 in order to find recurrence relationships between pquit for different values of u. We extract the probability of quitting the trust relationship and of the time at which this occurs by using the appropriate u for the parameter values in question.

Considering the model thus far described, an agent either continues to place trust indefinitely and learns the true value of the trustworthiness or quits at some time t<.

Lemma 2

(Converge or quit). A single agent partaking in the trust relationship described with ϑ>θcrit either (A) quits at some time τ<, i.e.,

(17) At=1,t<τ,andAt=0,∀tτ,(17)

or (B) continues to place trust indefinitely and has their estimate converge to the true trustworthiness, i.e.,

(18) At=1,∀t=1,2,,andlimtϑˆt=ϑ.(18)

This entails that

(19) P(ϑˆ/ϑ\ampτ=)=0.(19)

The proof follows from the definition of the estimated trustworthiness value and the law of large numbers and is found inAppendix A.

3.2. Results

In this subsection we study the probability of the agent quitting the trust relationship. We achieve this by finding the absorption probability of the random walk at some level u. We define by π(u) the probability of hitting an absorbing barrier at u in terms of the distance from the starting point Zt=0 to this absorbing barrier at u0:

(20) π(u)π(u,c,r,ϑ):=P(t:Ztu),∀uN0.(20)

Here we suppress the dependence on c,r and ϑ for ease of reading. Note that u can equal 0, corresponding to a scenario that absorption is certain. The analysis of π(u) is divided into three cases with respect to r and c. Two of the cases we characterise analytically, while we present a numerical approximation for the third. Using u=ucrit gives the probability of quitting, i.e.,

(21) pquit(α,β,c,r,ϑ)=π(ucrit,c,r,ϑ).(21)

In §§3.2.1–3.2.2 we provide the above-mentioned analysis of two cases: 1) rN and c=1, and 2) r=1 and cN. An approximation scheme for the case r,cN is presented in Appendix B. The split is a result of the fact that it is not possible to find a closed form result for the general case. The techniques used in the two cases presented also differ substantially and should therefore be viewed separately.

3.2.1. Case c=1, and rN

In this case the random walk exhibits a useful memorylessness property. To see this, observe that the walk can only go up levels one at a time, while on its way down it can skip levels. We therefore first define the probability of the random walking climbing by one level for some tN:

(22) ρ:=Pt:Zt1.(22)

As a consequence of the strong Markov property, we note that the dynamics after attaining 1 level is independent of the history by which this was done. This means that the probability of going up u levels is simply the probability of u times sequentially going up 1 level, so that π(u)=ρu. A formal expression of the probability of quitting, including a characterization of ρ, is presented in the following lemma which is proved in Appendix A.

Lemma 3

(Quitting probability when c=1, and rN). Suppose ϑ1/(1+r). The probability of the corresponding random walk with parameter values c=1, rN reaching an absorbing barrier at u satisfies

(23) π(u,1,r,ϑ)=ρ(ϑ)u,∀uN0,(23)

in which ρ(ϑ)=π(1,1,r,ϑ) is the unique solution in the range [0,1) to the equation

(24) ρ(ϑ)=(1ϑ)+ϑρ(ϑ)r+1.(24)

To find the probability of quitting we use ucrit=β+1 in

(25) pquit(α,β,1,r,ϑ)=π(ucrit,1,r,ϑ).(25)

In we plot numerical results of the quitting probability denoted pquit(α,β,c,r,ϑ)=π(ucrit,c,r,ϑ) for the subcases r=1,2,3. The Beta prior belief distribution is given by shape parameters α=β=2. The lines are theoretical results obtained by Lemma 3, and the dots (with confidence intervals) correspond to simulated results which corroborate the analytical results. The shape of the quitting probability is explicable by noting that a lower trustworthiness ϑ consistently leads to a higher quitting probability. The results of Lemmas 1 and 3 are illustrated as the probability of quitting is unity where ϑ<θcrit and follow (25) thereafter.

Figure 2. The probability of a single agent quitting pquit(α,β,c,r,ϑ) plotted against ϑ for different values of r while α=β=2 and c=1. Analytical results are plotted as lines, while simulated results (4 000 iterations) are shown as points with 95% confidence intervals.

Figure 2. The probability of a single agent quitting pquit(α,β,c,r,ϑ) plotted against ϑ for different values of r while α=β=2 and c=1. Analytical results are plotted as lines, while simulated results (4 000 iterations) are shown as points with 95% confidence intervals.

We now turn our interest to the expected time an agent spends in the system. Note that the time of quitting τ in terms of the random walk notation is such that

(26) τ(u)τ(u,c,r,ϑ)=inf{tN0:Ztu}.(26)

Then we are interested in the expectation of τ(u) conditioned on 1{τ(u)<} which we now know occurs with probability π(u). Specifically we investigate the quantity

(27) E[τ(u)|τ(u)<]=Eτ(u)1{τ(u)<}π(u),foruN0.(27)

We use the techniques from probability generating functions in order to describe the expectation of the distribution at hand. As such we inspect a number related to the expected time to reach the first level, τ(1):

(28) φ(z):=E[zτ(1)1{τ(1)<}],z[1,1],(28)

and note that

(29) E[zτ(u)1{τ(u)<}]=φ(z)u,∀uN0.(29)

This (again) follows from the fact that all levels {1,,u1} need to be passed first in order reach u. Thus, the expected time to reach the uth level is simply u times the expected time to reach the first level.

Lemma 4

(Expected time to quitting when c=1, and rN). If rN, c=1 and ϑ1/(r+1) then for all uN, τ(u) satisfies

(30) E[τ(u)|τ(u)<]=uφ (1)ρ,∀uN,(30)

where φ (z) is the derivative from below of the function φ(z) which, for any given z|1|, is the unique solution to

(31) φ(z)=z(1ϑ)+φ(z)r+1,(31)

and where ρ is the unique solution to EquationEqs. (24). In particular,

(32) E[τu|τu<]=1ϑr+1rρ,∀uN.(32)

To find the expected quitting time (conditional on quitting), we use ucrit=β+1 in

(33) q(α,β,1,r,ϑ)=τ(ucrit,1,r,ϑ).(33)

To find the expected time in the system one starts by solving (24) with the appropriate value of r for the unique solution of ρ0,1. This in turn is substituted into (32) along with u=ucrit, resulting in the desired expression for τ(ucrit,1,r,ϑ). For details, see the proof in Appendix A.

We plot numerical results as well as simulated results of the expected time in the system conditioned on the agent’s eventual quitting q(α,β,1,r,ϑ) in for the subcases r=1,2,3 on a log-linear axis. The Beta prior belief distribution is characterised by shape parameters α=β=2. We notice that the expected time is increasing toward an asymptote positioned on ϑ=1/(1+r)=θcrit. The position of this asymptote is due to the property of the model: quitting is still guaranteed for ϑ=θcrit but the time it takes has an infinite expectation (cf. the symmetric Bernoulli walk). This initial increase makes sense, as the probability of making a detour away from quitting is increasing with ϑ though it cannot escape forever in this region. The subsequent decrease as ϑ>θcrit results from the fact that, as ϑ increases, if quitting takes place then it does so earlier. This is because more time in the relationship exposes the agent to more unbiased information which is likely to indicate that the institution is trustworthy.

Figure 3. Expected time in the system conditioned on the agent quitting at some t< plotted against ϑ for different values of r while α=β=2 and c=1 on a log-linear axis. Analytical results are plotted as lines, while simulated results are shown as points with 95% confidence intervals.

Figure 3. Expected time in the system conditioned on the agent quitting at some t<∞ plotted against ϑ for different values of r while α=β=2 and c=1 on a log-linear axis. Analytical results are plotted as lines, while simulated results are shown as points with 95% confidence intervals.

3.2.2. Case r=1, and cN

n this case we notice that the walk Zt loses its memoryless property: steps upward now have size cN and so it is possible that levels are skipped along the way. This means that a relationship of the type π(u)=ρu no longer holds. We thus see that one has no guarantee of which levels are reached on the way to quitting, and therefore one cannot use the approach that we relied upon in the c=1 case. We nevertheless obtain the following result. In this case the appropriate value of u is given by ucrit=α+1.

Lemma 5

(Quitting probability when r=1, and cN). If r=1, cN and ϑc/(c+1), then the probability of quitting is given

(34) pquit=π(ucrit,c,1,ϑ),(34)

where π(u,c,1,ϑ) satisfies

(35) π(u)=ξ(u)(0)u!,∀uN;(35)

here ξ(u)(0) is the uth derivative of the function ξ(w) in w=0, where ξ(w) is defined by

(36) ξ(w)=[(1ϑ)(c+1)1]w2+[2(1ϑ)(c+1)]w(1ϑ)wc+1ϑ(1ϑ)wc+2(1ϑ)wc+1w2+(1+ϑ)wϑ,∀w(1,1).(36)

The proof (in Appendix A) uses a translation between the maximum of our random walk on one hand, and the minimum of a random walk with c=1 and rN on the other hand.

We observe that ξ(u) is of the form P(w)/Q(w), where P(w) and Q(w) are polynomials of degree c+1 and c+2 respectively. We evaluate values for π(u) by differentiating both sides of Q(w)ξ(w)=P(w), u times and substituting w=0 in order to get the expression for π(u). Here ξ(w) can be interpreted as a probability generating function with coefficients au=π(u), which allows calculation of the ruin probabilities by differentiating u times, setting w=0 and dividing by u!, a standard procedure from the theory of probability generating functions.

The resulting quitting probabilities are plotted in for the first two values of c. As expected, quitting remains certain until ϑ>c/(c+1), the stability condition. The prior belief parameters were chosen in order to ensure that the agent does not quit in round 0. The effect of the prior belief parameters is that ucrit=3 for c=1 and ucrit=2 for c=2.

Figure 4. The probability of a single agent quitting plotted against ϑ for different values of c while α=3,β=1 and r=1. Analytical results are plotted as lines, while simulated results (4 000 iterations) are shown as points with 95% confidence intervals.

Figure 4. The probability of a single agent quitting plotted against ϑ for different values of c while α=3,β=1 and r=1. Analytical results are plotted as lines, while simulated results (4 000 iterations) are shown as points with 95% confidence intervals.

As before, we are also interested in the expected time that an agent spends in the system placing trust, conditional on them quitting eventually. Similar hurdles arise here as a result of the potentially larger step in the direction of the barrier, though in this analysis these are handled without much extra machinery. The following lemma expresses the expected time that an agent places trust until quitting conditioned on their quitting eventually.

Lemma 6

(Expected time to quitting when r=1, and cN). Define

(37) Φ(w,z):=(1ϑ)zwwc+11wϑzφˉ(z)1(ϑz)/wwcz(1ϑ),w(1,1),(37)

and let w(z), for any given z(1,1), denote the unique solution for w(1,1) in

(38) wc+1(1ϑ)z+ϑz=w,(38)

and

(39) φˉ(z):=(1ϑ)zw(z)+ϑzw(z)1w(z).(39)

Suppose ϑ>c/(c+1) and the utility parameters of the agent are given by cN and r=1. The expected time for the corresponding random walk to hit an absorbing barrier at u satisfies

(40) E[τ(u)|τ(u)<]=φ (1,u)φ(1,u),(40)

where

(41) φ(z,u):=uwuΦ(w,z)u!w=0.(41)

In order to fully express Φ in terms of only w,ϑ, and z, one first uses the fact that a root of the denominator is necessarily also a root of the numerator (to keep Φ(w,z) bounded), and using it to express φ(z,1) in terms of z. The proof, presented in Appendix A, makes use of the object φ(z,u):=E[zτ(u)1{τ(u)<}] which is then used in the generating function

(42) Φ(w,z):=u=1wuφ(z,u)(42)

for w(1,1) from which we obtain the result.

In we plot numerical results as well as simulated results of the expected time in the system conditioned on the agent’s eventual quitting q(α,β,c,1,ϑ). We do so for the instances c=1,2 using the Beta prior belief distribution determined by shape parameters α=3 and β=1, i.e., the same parameter choices as the ones used in . As in the case with c=1 and rN, we notice that the expected time is increasing toward an asymptote, which is now positioned at ϑ=c/(c+1). After this critical ϑ there is a decrease as failing becomes more centered around the early time steps. More time in the system implies that the agent is exposed to more unbiased experience with the institution, and therefore they have a greater chance of remaining in the trust relationship.

Figure 5. Expected time in the system conditioned on the agent quitting at some t< plotted against ϑ for different values of c while α=3,β=1 and r=1. Analytical results are plotted as lines, while simulated results are shown as points with 95% confidence intervals.

Figure 5. Expected time in the system conditioned on the agent quitting at some t<∞ plotted against ϑ for different values of c while α=3,β=1 and r=1. Analytical results are plotted as lines, while simulated results are shown as points with 95% confidence intervals.

4. Trusting together

In this section we extend the model from one truster agent to two truster agents. To this end, we enhance the model with a communication mechanism (between the truster agents, that is) that retains the assumptions of myopically rationality.

We consider agents who communicate as frequently with one another as they make decisions to place or not place trust. The nature of this interaction is considered in two forms. The first is that the agents share completely their own interaction history with each other. The second form is that the agents do not communicate about their outcomes at all, and only observe which action is being taken by the other agent in each round.

4.1. Shared modelling aspects

In both mechanisms we have two agents, each with their own interaction with the institution. As such, we have decisions for each agent per round from the action set A={0,1}2 in which Ai,t=1 if agent i{1,2} places trust in round t. For each round t and agent i,

(43) Xi,t=1,withprobabilityϑ0,withprobability(1ϑ),(43)

where Xi,t=1 indicates that the trust that was placed in that round was subsequently honored by the institution. Similarly to the single learner case the agents use

(44) Sˆi,t=n=0tXi,n1{Ai,n=1},(44)

to keep track of how many times trust was honored to them in rounds when they placed trust. We also define the number of times trust was abused to each agent i=1,2 by

(45) Fˆi,t=n=0t(1Xi,n)1{Ai,n=1}.(45)

The agents are equipped with the same Beta prior belief distribution as before. The prior belief distribution (with shape parameters αi,βiN) for agent i is therefore

(46) Pi,0=B(θ;αi,βi)=θαi1(1θ)βi101yαi1(1y)βi1dy,θ[0,1].(46)

For simplicity, from now on we consider only the case of homogeneous priors (i.e., α1=α2 and β1=β2). It should be noted that one can deal with non-homogeneous priors in the exact same way, where one should specify what each of the agents knows about their neighbor’s prior. One could assume that they know it exactly, or hold some belief distribution on it; evidently, the formulation becomes more tedious but is conceptually straightforward.

As we are only considering the homogeneous case, we drop the subscript on the prior belief distribution, calling it simply P0(). Let the information received by agent i in round t be denoted Ii,t(θ). The precise formulation of this information is a modeling outcome of the choice of communication; see (50) of §4.2 for the first mechanism, and (59) of §4.3 for the second mechanism. The resulting belief update rule becomes

(47) Pi,t(θ)=θSˆi,t(1θ)Fˆi,tP0(θ)Ii,t(θ)01ySˆi,t(1y)Fˆi,tP0(y)Ii,t(y)dy,θ[0,1].(47)

The difference between (47) and (2) lies in the presence of the information received by the agent. Subsequently the agents use the mean of their belief distribution as a point estimate for the true trustworthiness of the institution and use this to make the decision whether or not to place trust in round t+1. If we define

(48) ϑˆi,t=01θPi,t(θ)dθ,θ[0,1],(48)

i.e., the mean belief held by agent i at the end of round t, then agent i takes the action Ai,t+1 in round t+1 where

(49) Ai,t=1,ifrϑˆi,nc(1ϑˆi,n)0,∀n{0,1,,t1},0,otherwise.(49)

Note that the condition in (49) can be rewritten as ϑˆi,nθcrit,n{0,1,,t1}, with θcrit:=c/(c+r). In the next two subsections we discuss the two different communication mechanisms in more detail.

4.2. Observable rewards (OR)

This case considers agents that communicate fully with one another about their experiences with the institution. The dynamics of this model are represented graphically in . The extension from the single learner model (represented in for comparison) to this model consists of the shared information between agents which entails the response of the institution to trust that was placed.

Figure 6. The observable rewards model of learning.

Figure 6. The observable rewards model of learning.

Each agent thus has same information available to them, and under homogeneous prior beliefs they have the same estimate of the true trustworthiness. In this case we define the information agent i receives from their neighbor ji in round t,

(50) θIi,t:=θSˆj,t(1θ)Fˆj,t,θ[0,1].(50)

We see that this information contains for each agent exactly the interaction history of the other agent, hence we have the line from agent i’s “Trusting” action to agent j’s “Belief” distribution in

As mentioned, the agents’ estimated values of the true trustworthiness are the same. Thus we can drop the subscript i from ϑˆi,t and note that it is given by

(51) ϑˆt=α+i=12Sˆi,tα+β+i=12Sˆi,t+i=12Fˆi,t.(51)

Defining Zt=ci=12Fˆi,tri=12Sˆi,t, similarly as in the single-agent model we get a random walk that takes steps

(52) Zt+1=Zt+2c,withprobability(1ϑ)2Zt+cr,withprobability2ϑ(1ϑ)Zt2r,withprobabilityϑ2,(52)

with an absorbing barrier at u=+1.

In the case of heterogeneous priors the formulation of this model does not change dramatically. The fates of the agents would not be tied as before and so there would simply be two random walks defined as in (52), one for each agent. Each random walk would have its own absorbing barrier defined by ui=rαicβi+1. Recall that αi,βi,r and c for all i are taken from N and so ui too is an integer for both i. If at some time tq one of the agents quit, then the remaining agent simply continues according to the dynamics of the single agent model with the new transitions defined by

(53) Zt+1=Zt+c,withprobability(1ϑ)Ztr,withprobabilityϑ,(53)
fort>tq.

4.3. Observable actions (OA)

In this setting the agents do not communicate explicitly by sharing information about their interaction history. Instead, they observe the actions of their neighboring agent, and use this to infer something about the possible histories their neighboring agent may have experienced leading to such a result. This is where subtle intricacies arise, as rational agents need to keep in mind that their own actions are observed by their neighbor and thus will also affect the decisions made by their neighbor.

The extent of the communication between the two agents in this model is binary indicating whether or not trust was placed at the end of round t. This information is only incorporated into the agents’ belief distributions for round t+1. This model is depicted graphically in . The crux of this model is the line from one agents “Action” decision to the other agent’s “Belief” distribution. The action of not placing trust implies that the agent exits the system forever. The quitting agent does not continue to update their belief based on their neighbor’s actions. The information sent by the quitting agent is used as the information received by the not-quitting agent for the remainder of the not-quitting agent’s tenure in the system.

Figure 7. The observable actions model of learning.

Figure 7. The observable actions model of learning.

The nature of the communication necessitates separate definitions for the information sent with a “yes” signal, when the agent places trust, and for the information sent with a “no” signal, when the agent does not place trust. As such we define for i=1,2, ji and tN,

(54) Ii,t:=Yi,tift<τj,Ni,τjiftτj,(54)

where τj is the round in which agent j quits for j=1,2. We formally define Yi,t and Ni,t in (59) and (61) respectively. By the rationality assumption, the agents use the binary information along with the knowledge that their own outbound information up until round t1 was used rationally by the other agent. With this they infer a range of possible histories that may have lead to the decision taken by their neighbor.

We define the belief distributions Rti, formed by taking only the report received by agent i (signal sent at time t by agent j to agent i) into account:

(55) Rti=P0(θ)Ii,t(θ)01P0(y)Ii,t(y)dy,θ[0,1].(55)

We then define the auxiliary belief distribution resulting from a combination of Rti and θx(1θ)tx where x is one of the possible number of positive experiences giving Dt(x) the distribution:

(56) Dt(x)=θx(1θ)txRti01yx(1y)txRtidy,θ[0,1].(56)

This allows us to define the set of permitted positions of agent i, Jti as

(57) Jti={x{0,1,,t}|EθDt(x)[θ]θcrit}.(57)

We have that Dt(x) for all xJti are the possible belief distributions held by agent i. These possible belief distributions are then summed in Ij,t weighted according wx(t1) the number of ways in which these x positive experiences may have been realised (i.e., the number of histories resulting in x positive experiences by time t). Defining wx(t) is done recursively:

(58) wx(t)=wx(t1)+wx1(t1)if(x1)Jt1wx(t1)if(x1)Jt11ifx=t0ifx>torxJt1,forx[0,t]andtN(58)

with initial condition wx(1)=1 for all xJ1. The information received by agent i from agent j considering that agent j’s decision was to place trust in round t is thus defined as

(59) θYi,t:=xJt1jwx(t1)θx(1θ)t1x,θ[0,1].(59)

We can interpret wx(t) as the number of walks from (0,0) to (x,tx) that retain xτJτ for τ=1,2,,t. illustrates the permitted walks for a “yes” signal with the boundary between x in and out of Jt being represented as the thick horizontal line.

Figure 8. An illustration of the weighting wx(t) used in the interpretation of the observed action.

Figure 8. An illustration of the weighting wx(t) used in the interpretation of the observed action.

In order to define the information sent with a “no” signal, we define the range of values which would result in “yes” signals up until round t1 and a “no” in round t:

(60) Kti=[max{0,infxJt1i},,infxJti1].(60)

Subsequently, we have the information sent in a “no” signal:

(61) θNi,t:=xKtwx(t)θx(1θ)t1x,θ[0,1](61)

where wx(t) is defined as above. The number of permitted walks to a “no” signal at time t is simply the number of permitted walks to a “yes” signal at time t1 which are also one negative experience away from quitting. This is illustrated in with a horizontal line indicating the boundary between x in and out of Jt and the dotted line indicating the boundary between x in and out of Kt. Note that the information sent by a “no” signal in subsequent rounds (after the first such signal) does not change, as we allude to in (54).

The agents finally construct an estimate by combining their own information with the signal information:

(62) Pi,t(θ)=θSˆi,t(1θ)Fˆi,tP0(θ)Ij,t(θ)01ySˆi,t(1y)Fˆi,tP0(y)Ij,t(y)dy,θ[0,1],(62)

and place trust in round t+1 if EθPi,t(θ)[θ]θcrit.

4.4. Illustration of the differences between the models presented

In this subsection we present a set of example experiences for two players. These serve to demonstrate the workings of the dynamics in the OA model and the OR model.

We consider both dual agent models with parameters c=2, r=1 (and therefore θcrit=2/3) and ucrit=2 (α=5,β=2). Without determining whether or not these interaction outcomes are witnessed or not, let us presume the responses by the institution to the two agents are given for the first four rounds (Xi,1,Xi,2,Xi,3,Xi,4)=(0,0,1,1) for i=1, and (Xi,1,Xi,2,Xi,3,Xi,4)=(1,1,1,0) for i=2. The resulting random walk interpretations within the single agent model as well as the observable rewards model are depicted in .

Figure 9. The random walk interpretation of the or model as well the individual sample paths of the respective agents.

Figure 9. The random walk interpretation of the or model as well the individual sample paths of the respective agents.

We observe that the first agent would have quit in the single agent model after the first interaction, but in the OR model they continue past this point. The OA model is not so easily represented visually and so will require more elucidation. We offer summarizing the variables over time for the single agent models for agent 1 and 2 respectively as well as in the dual agent models. Note that both agents 1 and 2 have the same estimate throughout in the OR model, due to the fact that they start with the same prior and are privy to the same information each round. The first agent’s estimate stops changing after the first round, as a result of their quitting which is the case in the single agent model as well as in the OA dual agent model. This is because by the time they take the decision to not place trust, they have not yet received a signal containing information from agent 2. The reactions of the institution Xi,t for i=1 and t2 are thus not witnessed and not used to update agent 1’s belief.

Table 3. Estimates, ϑˆt in the respective models.

In the first round of the OA model both agents behave exactly as they would in the single agent model because their first action is non-informative. After this first round, the first agent quits and takes the “not place trust” action in round 2 and all subsequent rounds. This indicates to the second agent that trust was abused in agent 1’s first round. In this case wx(1)=1 (as part of the initial condition of wx) and I2,2=θ0(1θ)1 while their personal S2 equals 2, which provides

(63) P2,2(θ)=θ2θ51(1θ)21(1θ)101y2y51(1y)21(1y)1dy.(63)

The mean of this belief distribution is used as the point estimate; it is

(64) ϑˆ2=01θθ6(1θ)201y6(1y)2dydθ=0.7.(64)

For illustrative purposes we also show the information sent out by agent 2 by their action in round 3, i.e., I1,3. Firstly they construct R22, the belief formed by taking into account the report received by agent 2 by round 2:

(65) R22(θ)=P0(θ)I2,2(θ)01P0(y)I2,2(y)dy=θ4(1θ)201y4(1y)2dy.(65)

Subsequently the auxiliary distribution D2 is constructed:

(66) D2(x)=θx(1θ)2xR22(θ)01yx(1y)2xR22(y)dy=θx+4(1θ)4x01yx+4(1y)4xdy.(66)

This distribution is then used to construct the set Jt=2i=2={2} considering that this is the only x for which 01D2(x)dθ2/3. Considering that x=t we know that w2(2)=1, which means that I1,3=θ2.

This example illustrates an information cascade, as the second agent would not quit in the first 4 rounds in any model except for the OR model. The effect for the first agent on the other hand is that instead of quitting in round 1, as in the single agent model as well as in the OA model, they quit in round 2 of the OR model.

5. Setup of simulation experiments

In this section we describe the setup of our simulations experiments. The results of these simulation experiments we present in §6, while in §7 we discuss and interpret the results of the experiments. The primary goal of the experiments is to assess the two communication mechanisms in the two agents case, in terms of the probabilities of quitting and the expected quitting times (conditional on quitting).

5.1. Choice of c and r

In order to get an idea of how the model parameters influence the probability (and timing) of agents quitting, we choose the following five combinations of (c,r): (1,1),(3,2),(2,3),(2,1),(1,2), so that c=r is the “base case” and c=2r and r=2c are the “extremal ends.” There are two reasons for choosing this range of ratios. The first consideration is that if the ratio gets pushed further in either direction, then the simulation results become harder to obtain. The critical trustworthiness θcrit is shifted toward 1 with an increase of c, which means that only a small range of ϑ have probabilities of quitting less than unity. Within this small range there is a sharp drop of quitting probabilities, because with very large ϑ the probability of quitting is low as this requires an abuse of trust. In the other extreme, with an increase of r, θcrit decreases, so that increasingly many parameter settings have to run over a relatively long time interval in an already slow simulation. This is because of the large number of numerical integrations required in the observable actions model used in constructing Jti, Dt and Rti for each i=1,2. The second reason is that it is a sufficiently extensive range of cost-to-reward ratios, especially in the context of trust problems.Footnote3

5.2. Choice of α and β

Intuitively, the parameters α and β can be considered as a number of interactions with the institution prior to the dynamics we consider, which resulted in trust being honored α times and abused β times. This means that the greater α is compared to β, the more the prior belief distribution of the agents is skewed toward greater values of ϑ. Similarly, for a lower α compared to β, this prior belief distribution is skewed toward the lower values of ϑ. The prior belief parameter settings α and β, in combination with the choice of c and r, determine the instance’s ucrit=+1. Borrowing from the analogy of the one-dimensional random walk of the single agent model, the value of ucrit represents a “distance” to quitting. If we would like consistency in the interpretation of ucrit, we require a conversion: Take u=ucrit/c, which is the minimum time in which an agent can quit in the single agent model. There is asymmetry in the model between the cases when c>r and c<r. In case c>r, it is possible for an agent to experience an honoring of trust in the first round, yet still quit after the second if the step size for an abuse of trust is such that c>ucrit+r. In case r>c this scenario is not possible. We choose values of α and β such that, in combination with the values of r and c, we cover the range u=1,2,3. The choice to stop at u=3 is made to keep the probability of quitting high enough to facilitate efficient simulation.

5.3. Iterations and simulation length

In our simulation study, it is our goal to produce reliable estimates for the parameters under study. We run each of the models for 4000 truster agents in total, i.e., the single agent models are run 4000 times while the dual agent models are run 2000 times. We run each simulation for a maximum of 500 time steps, with the exception of simulations with ϑ{0.84,0.9} in which case we run the model for a maximum time of 200 time steps. These choices were made keeping considerations of confidence interval width in mind as well as execution feasibility. Considering that in the two agent models the dynamics are sped up (as there is more information available per time step), this should be sufficient time also for both two agent models.

In the interactions with highest ϑ the probability of quitting is very low and concentrated on the first couple of time steps. We can make an educated (pessimistic) estimate as to how many time steps would be required in order to not miss any relevant “probability mass.” We do this relying on Markov’s inequality applied to the expected time to quitting in the single agent model. As an illustration consider the parameter setting with c=2,r=1,α=5, and β=2 (such that u=1) at the parameter setting ϑ=0.84. In the single agent model the expected time to quit is E[τ|τ<]3.17. By Markov’s inequality we know that the probability of quitting at time τ200 is bounded as follows:

(67) P(τ200|τ<)E[τ|τ<]200=0.01587.(67)

Hence, in the N experiments performed, the expected number of times we miss the quitting event is L=NP(τ200|τ<)P(τ<). Taking N=4000, we obtain the bound

(68) L<4000×0.01587×0.2635=16.727.(68)

The single learner case is a conservative benchmark, as there is less communication than in the two agents models, so that we anticipate agents in the two agent models to quit sooner. The extent to which this estimate is conservative, is illustrated by the fact that in the single agent model (for the parameter settings described) the latest quitting occurred at round 45, for the dual agent OA model at round 40, and for the dual agent OR model at round 17.

6. Results of experiments

In this section we present the results of the experiments described in §5. We plot the results for specific values of (c,r) and u in the numerical output of all experiments is given, in a tabulated format, in Appendix D. The plots cover the settings (c,r)=(1,1),(2,1),(1,2) and u=1,3, where we add the synthetic points pquit(α,β,c,r,ϑ=θcrit)=1 and pquit(α,β,c,r,ϑ=1)=0 to the curves shown in the figures in order to show the dynamics beyond the capability of the simulation. To be able to see subtle differences in results, we show the probability of quitting predominantly in the region in which quitting is not guaranteed.

Figure 10. Simulation results for c=r=1 in the u=1 case.

Figure 10. Simulation results for c=r=1 in the u∗=1 case.

Figure 11. Simulation results for c=r=1 in the u=3 case.

Figure 11. Simulation results for c=r=1 in the u∗=3 case.

Figure 12. Simulation results for c=1 and r=2 in the u=1 case.

Figure 12. Simulation results for c=1 and r=2 in the u∗=1 case.

Figure 13. Simulation results for c=1 and r=2 in the u=3 case.

Figure 13. Simulation results for c=1 and r=2 in the u∗=3 case.

Figure 14. Simulation results for c=2 and r=1 in the u=1 case.

Figure 14. Simulation results for c=2 and r=1 in the u∗=1 case.

Figure 15. Simulation results for c=2 and r=1 in the u=3 case.

Figure 15. Simulation results for c=2 and r=1 in the u∗=3 case.

6.1. The probability of quitting

The probability of quitting in the regime ϑ<θcrit should remain unity: the same and more information is made available to the agents in the dual agent model as in the single agent model. The values of 0.999 in Tables S5a–S7a could have been remedied at the cost of more simulation effort (i.e., by working with more runs with more time steps).

In the (more interesting) regime ϑ>θcrit, for most parameter settings the probability of quitting increases from the OR model to the OA model and again to the single agent model. This trend is more pronounced for u=1 than for u=2 and u=3. There are two exceptions to this trend in Tables S6a and S7a: for ϑ=0.65,0.66 at the parameter settings c=3 and r=2, with u=2,3. However, we observed overlap in the original confidence intervals.

To determine whether such a difference bears significance we conducted more simulations for the faster case with u=2. For the OR model, being computationally lighter, we conducted a total of 200 000 runs. For the slower OA model, we conducted 40 000 simulations in total. The resulting confidence intervals of the points ϑ=0.65,0.66 are depicted in . We observe the, perhaps unexpected, result that the OA model produces a lower probability of quitting in this parameter setting. Hence, the OR model does not always outperform the OA model in terms of the probability of quitting in the regime where ϑ>θcrit.

Figure 16. The results of extra simulation runs for the probability of quitting. In these runs c=3, r=2, and u=2.

Figure 16. The results of extra simulation runs for the probability of quitting. In these runs c=3, r=2, and u∗=2.

Part of the explanation of the lower probability of quitting in the OA model relates to the timing of the communication, being at the end of a round. In the OR model, as soon as an agent has their own reward, they also observe the reward of their neighbor. This boils down to two pieces of information per round. In the OA model however, the agents only observe the actions of their neighbor at the same time when they have already made their own action for that round. That means that an agent who observes their neighbor not placing trust for the first time in round t can only use the information contained therein to inform their action in round t+1. In the meantime, as they have not quit yet in round t, they are privy to at least that round’s outcome before they have to make another decision. We elaborate on this effect in Appendix C.

6.2. Expected time to quit

In general (in the regime ϑ<θcrit) the expected time to quit is lower in the dual agent models than in the single agent model. Furthermore, the OR model tends to have the lowest expected time to quit (i.e., the greatest effect), though with a fair amount of crossing with the OA model. The key exception to this trend occurs at c=r=1 with u=1, depicted in . At this setting, the OA model performs better than both others which show a similar time to quitting. Furthermore, there are exceptions at c=2, r=1 and u=1 depicted in , and at c=3, r=2 and u=1 shown in Table S5b.

In the regime ϑ<θcrit there is the trend (with exceptions) that quitting in the OR model occurs sooner than in both other models. In the OR model the agents are exposed to two outcomes per round of interaction and so receive twice as much unbiased information than in the single agent model. In this regime, the unbiased information thus received is likely to indicate that the institution is trustworthy and so quitting becomes less likely with more time (i.e., quitting must occur quickly or not at all). This trend is most pronounced when u=3 shown in as well as in Table S7b.

7. Discussion

In this section we draw conclusions from the numerical experiments, discuss how these findings relate to those in the literature, and present directions for future research.

7.1. General conclusions

In all of the models a similar pattern in the expected time to quitting holds: there is an increase in the expected time to quit as ϑ increases until θcrit and a decrease afterward. This specifically entails that a long average tenure of customers at an institution, does not indicate that this institution is to be trusted, or, put differently, there is no way of knowing which side of the critical value they might be on with only this information. An institution can simply have a trustworthiness that is just high enough to keep customers placing their trust in them long enough to get positive net utility from that relationship yet not actually high enough to warrant an indefinite relationship. An institution with many relationships still going and a couple of very short concluded ones on the other hand might be a good indication that the institution is indeed trustworthy.

By comparing the probability of quitting for the same c and r at different u (for example in and we see that the difference between the three models is relatively small for u=2,3. This shows that starting optimistically allows agents to secure good chances of trusting a trustworthy institution in the long run.

We encountered the (perhaps) remarkable phenomenon that the effects of the different two agent communication mechanisms are not monotone. This is highlighted in the cases (a) with c=r=1 and u=1, depicted in , the case with c=3, r=2 and u=2, depicted in . In (a), the OR model outperforms the OA model in terms of probability of trusting a trustworthy institution. At the same time however, the OA model outperforms the OR model in making the decision to quit a trust relationship with an untrustworthy institution sooner. In (b), we see the OA model outperforming the OR model in terms of the probability of trusting a trustworthy institution for values ϑ=0.65,0.66. These results show us that we cannot state that the OR model is always “better” than the OA model. In fact this depends on the parameter setting as well as on which criterion you find most important: making the correct decision in the long run, or being sure to end a relationship with an untrustworthy institution relatively quickly.

This surprising result is partially due to the “timing” of the underlying dynamics. Agents make their decision to place trust at the same time and so can only use information from their neighbors action, one round later. We illustrate this in a two-rounds setting in Appendix C.

We summarize our insights from the numerical results:

  • Communication always helps: it increases the probability of never ceasing to trust a trustworthy institution, and it decreases the expected time until quitting a relationship with an untrustworthy institution, both compared to the single agent model.

  • The OA model can be better or worse than the OR model, depending on the performance measure of interest (probability of quitting, or expected time to quit). There are instances in which having less information is beneficial to the agents.

  • A good way to increase chances of trusting a trustworthy institution in the long run without a social network is to start with a more optimistic prior.

7.2. Reflection and context

In this paragraph we compare our model to those found in the literature streams conceptually. In the subsequent paragraphs we compare the outcomes of our model to those in the literature. We presented a model of trust which includes both learning from interactions with the institution as well as learning from communication between agents. Thematically this work is related to social network trust where the focus is on the possible loss of trust. Methodologically this work is related to the social network learning. The agents in our model are learning about the environment (trustworthiness of the trustee). Our model extends ideas in both of these streams in a natural way: The dependence structure between actions taken by the agents, and the signals they subsequently use to learn, extends in a realistic way the work from the social network learning literature, which typically does not cover this complication. Finally, the observable actions model of communication adds realism to the relationships between agents, not present in the more common observable rewards model of communication in the social network trust literature.

The results of our investigation, akin to those from Buskens (Buskens, Citation2003), show that trust increases in the models with truster agents sharing information about the trustee. In our model, however, this effect extends to models in which a trustee dishonoring trust in one round does not immediately indicate that they are not to be trusted. Furthermore, we extended the type of communication between the truster agents beyond complete information sharing. We find that the positive effect of agent communication, though stronger in the complete information sharing, is also present in the model in which communication is limited to observing actions. Hence, even when individuals do not hold extensive discussion with their peers about the institution, simply observing their actions provides a significant benefit, which has previously been shown for more extensive communication in the social network trust literature (Buskens, Citation2003; Buskens et al., Citation2010; Frey et al., Citation2015).

In both models of communication that we consider, for all parameter settings, we observe that rational use of social network information increases the chances of agents to learn the true trustworthiness when it is rational to place trust. We also observe that for both models of communication and in all parameter settings, the expected time to quitting is sooner, which is especially beneficial when it is rational to quit. Our model includes realistic assumptions in terms of agent communication and of signal dependence on actions, and shows that communication between agents are an aid to learning and trusting. The simplistic nature of independent signals and actions cannot be imported to models of trust and learning. It is the nature of a trust problem that resources have to be placed at the liberty of the trustee agent in order to see how they respond. However, like in the social network learning literature (cf (Harel et al., Citation2021; Huang et al., Citation2021), our work shows that more communication leads to faster dynamics, but, due to the dependence of signals on actions, sometimes the result is converging to the sub-optimal action (i.e., not placing trust).

7.3. Future work

We see from the developed model that agents communicating with one another bear a benefit. There are, however, situations in which the agent that otherwise might have stayed on the course of placing trust “wrongfully” stops placing trust as a result of what they hear from their neighbor. It seems that the overall dynamics are not dominated by this effect. A natural question then becomes whether or not this beneficial effect retains in a model with more than two agents. The present model is limited to two agents partially as a result of the intense computational work involved: agents in the OA model perform numerous numerical integrations per round in order to identify which histories of experiences are plausible for the communication received from their neighbor. It is an open question whether there is a scalable approach to perform these computations. Alternatively, one could find a way around this by relaxing the agents’ rationality when it comes to interpreting their neighbors actions. This would allow investigation of a greater pool of agents who have some sophistication in learning from their private signal, but comes at the cost of simplifying the model.

Another line of future work concerns asymmetric communication between two truster agents. One of the truster agents may be modeled as a news outlet which only sends information without receiving any in return. In the same spirit one agent might be malicious by spreading misinformation about the institution. It may be that this asymmetry between agents may make technical results more attainable. For instance, one can condition on the private signal of the only sending truster and observe the dynamics of the receiving truster. The dynamics for the purely sending truster agent conveniently follows the dynamics we have presented in the context of the single-agent trust model. Modelling a malicious actor is also possible by deciding a priori what signal they will send, or by having a truthfulness parameter by which the truster agent communicates their actual experience with probability η and communicates that trust was abused (regardless of the truth) with probability 1η. The honest agent in such a model may then need to learn not only about the trustworthiness of the institution but also about the information they receive from their network.

Acknowledgments

This research was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 945045, and by the NWO Gravitation project NETWORKS under grant no. 024.002.003. This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative. Version: Fri Apr 12 06:53:34 2024.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The work was supported by the H2020 Marie Skłodowska-Curie Actions [945045]; Nederlandse Organisatie voor Wetenschappelijk Onderzoek [024.002.003].

Notes

1 Work in biological perception has shown that humans often behave in a Bayesian manner. For a discussion on the implications hereof and a list of results we refer to Knill and Pouget (Knill & Pouget, Citation2004).

2 Not only is myopic decision making a common assumption in the literature (cf (Bala & Goyal, Citation1998; Harel et al., Citation2021; Keppo et al., Citation2008; Parikh & Krasucki, Citation1990; Sebenius & Geanakoplos, Citation1983), but there is also some experimental evidence that points to semi-myopic behavior in humans (Zhang & Yu, Citation2013a, Citation2013b).

3 Equivalent dynamics arise when framing the interaction as paying utility x to the institution. Thereafter an honorable action from the institution equates to receiving utility x+δ. An abusive action on the other hand means the x is lost. Now c=x and r=δ, and our range covers values from δ=2x to δ=x/2 which we consider trust problems. We consider δ>2x closer to an issue of gambling, and δ<x/2 closer to transformation of utility from one form to another (without any risk, that is).

References

  • Acemoglu, D., Dahleh, M. A., Lobel, I., & Ozdaglar, A. (2011). Bayesian learning in social networks. The Review of Economic Studies, 78, 1201–1236. https://doi.org/10.1093/restud/rdr004
  • Bajos, N., Spire, A., Silberzan, L., Sireyjol, A., Jusot, F., Meyer, L., Franck, J. E., & Warszawski, J. (2022). When lack of trust in the government and in scientists reinforces social inequalities in vaccination against COVID-19. Frontiers in Public Health. https://doi.org/10.3389/fpubh.2022.908152
  • Bala, V., & Goyal, S. (1998). Learning from neighbours. The Review of Economic Studies, 65(3), 595–621. https://doi.org/10.1111/1467-937X.00059
  • Banerjee, A., & Fudenberg, D. (2004). Word-of-mouth learning. Games and Economic Behaviour, 46, 1–22. https://doi.org/10.1016/S0899-8256(03)00048-4
  • Bikhchandani, S., Hirschleifer, D., & Welch, I. (1998). Learning from the behavior of others: Conformity, fads, and informational cascades. Journal of Economic Perspectives, 12(3), 151–170.
  • Bower, A. G., Garber, S., & Watson, J. C. (1996). Learning about a population of agents and the evolution of trust and cooperation. International Journal of Industrial Organisation, 15, 165–190. https://doi.org/10.1016/0167-7187(95)00495-5
  • Buskens, V. (2002). Social networks and trust. Kluwer Academic Publishers.
  • Buskens, V. (2003). Trust in triads: Effects of exit, control and learning. Games and Economic Behavior, 42, 235–252. https://doi.org/10.1016/S0899-8256(02)00563-8
  • Buskens, V., & Raub, W. (2002). Embedded trust: Control and learning. In E. J. Lawler & S. R. Thye (Eds.), Group Cohesian, Trust and Solidarity (pp. 167–202). Elsevier.
  • Buskens, V., Raub, W., & van der Veer, J. (2010). Trust in triads: An experimental study. Social Networks, 32(4), 301–312. https://doi.org/10.1016/j.socnet.2010.05.001
  • Cong, T., & Sato, M. (1982). One-dimensional random walk with unequal step lengths restricted by an absorbing barrier. Discrete Mathematics, 40, 153–162. https://doi.org/10.1016/0012-365X(82)90116-9
  • Correa, J., Mari, M., & Xia, A. (2020). Dynamic pricing with Bayesian updates from online reviews. NeurIPS Workshop on Machine Learning for Economic Policy.
  • DeGroot, M. H. (1974). Reaching a consensus. Journal of the American Statistical Association, 69(345), 118–121. https://doi.org/10.1080/01621459.1974.10480137
  • Frey, V., Buskens, V., & Raub, W. (2015). Embedding trust: A game-theoretic model for investments in and returns on network embeddedness. Journal of Mathematical Sociology, 39(1), 39–72. https://doi.org/10.1080/0022250X.2014.897947
  • Fu, W., & Le Riche, A. (2021). Endogenous growth model with Bayesian learning and technology selection. Mathematical Social Sciences, 114, 58–71. https://doi.org/10.1016/j.mathsocsci.2021.10.003
  • Harel, M., Mossel, E., Strack, P., & Tamuz, O. (2021). Rational groupthink. The Quarterly Journal of Economics, 136(1), 621–668. https://doi.org/10.1093/qje/qjaa026
  • Huang, W., Strack, P., & Tamuz, O. (2021). Learning in repeated interactions on networks”. arXiv:2112.14265v2.
  • Keppo, J., Smith, L., & Davydov, D. (2008). Optimal electoral timing: Exercise wisely and you may live longer. The Review of Economic Studies, 75, 597–628. https://doi.org/10.1111/j.1467-937X.2008.00493.x
  • Knill, D. C., & Pouget, A. (2004). The bayesian brain: The role of uncertainty in neural coding and computation. Trends in Neurosciences, 27(12), 712–719. https://doi.org/10.1016/j.tins.2004.10.007
  • Kolb, A., & Madsen, E. (2022,). Under suspicion: Trust dynamics with secret undermining. The Review of Economic Studies, 90(2), rdac034. https://doi.org/10.1093/restud/rdac034
  • Kyprianou, A. (2006). Introductory Lectures on Fluctuations of Lévy Processes with Applications. Springer.
  • Kyprianou, A. (2010). The Wiener-Hopf decomposition. In R. Cont (Ed.), Encyclopedia of Quantitative Finance. John Wiley & Sons, Ltd.
  • Lazarus, J. V., Wyka, K., White, T. M., Picchio, C. A., Rabin, K., Ratzan, S. C., Leigh, J. P., Hu, J., & El-Mogandes, A. (2022). Revisiting COVID-19 vaccine hesitancy around the world using data from 23 countries in 2021. Nature Communications, 13(1), 3801. https://doi.org/10.1038/s41467-022-31441-x
  • Molavi, P., Tahbaz-Salehi, A., & Jadbabaie, A. (2018). A theory of non-Bayesian social learning. Econometrica, 86(2), 445–490. https://doi.org/10.3982/ECTA14613
  • Ozdemir, S., Zhang, S., Gupta, S., & Bebek, G. (2020). The effects of trust and peer influence on corporate brand–consumer relationship and consumer loyalty. Journal of Business Research, 117, 791–805. https://doi.org/10.1016/j.jbusres.2020.02.027
  • Parikh, R., & Krasucki, P. (1990). Communication, consensus and knowledge. Journal of Economic Theory, 52, 178–189. https://doi.org/10.1016/0022-0531(90)90073-S
  • Pechmann, C., & Knight, S. J. (2002). An experimental investigation of the joint effects of advertising and peers on adolescents’ beliefs and intentions about cigarette consumption. Journal of Consumer Research, 29(1), 5–19. https://doi.org/10.1086/339918
  • Sebenius, J. K., & Geanakoplos, J. (1983). Don’t bet on it: Contingent agreements with asymmetric information. Journal of the American Statistical Association, 78(382), 424–426.
  • Wang, Y., & Yu, C. (2017). Social interaction-based consumer decision-making model in social commerce: The role of word of mouth and observational learning. International Journal of Information Management, 37(3), 179–189. https://doi.org/10.1016/j.ijinfomgt.2015.11.005
  • World Health Organization. (2020). Munich security conference. https://www.who.int/dg/speeches/detail/munich-security-conference. Accessed: 2022-12-09.
  • Zhang, S., & Yu, A. J. (2013a). Cheap but clever: Human active learning in a bandit setting. Proceedings of the Annual Meeting of the Cognitive Science Society, Berlin, Germany (Vol. 35). Curran Associates, Inc.
  • Zhang, S., & Yu, A. J. (2013b). Forgetful bayes and myopic planning: Human learning and decision-making in a bandit setting. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 26, p. 9). Curran Associates, Inc.

A.

Appendix: Proofs of results

A.1. Proof of Lemma 1(Guaranteed quitting)

Proof. We prove this by contradiction. Suppose that At=1 for all tN. Hence,

(69) rϑˆtc(1ϑˆt)0,∀tN.(69)

The agent estimates the trustworthiness in round t by

(70) ϑˆt=α+Sˆtα+β+t,(70)

and we know by the law of large numbers that

(71) limtStt=ϑ(almostsurely).(71)

This also implies that

(72) limtϑˆt=ϑ<c/(r+c)(almostsurely).(72)

This means that there exists t for which

(73) rϑˆtc(1ϑˆt)<rcc+rc1cc+r,(73)
(74) <rcc+rc(c+r)c+r+c2c+r=0c+r.(74)

At this value of t, the expected utility of the interaction is negative, which implies that the agent takes action At=0, a contradiction. □

A.2. Proof of Lemma 2 (Converge or quit)

Proof. The agent makes use of a Bayesian belief update starting with a beta distributed prior belief with shape parameters α and β. Given that Sˆt=t=1tXt1At=1, where Xt is a binary random variable taking the value 1 at probability ϑ and 0 at probability (1ϑ), where ϑ is the value which the agent is trying to estimate, we have that the belief distribution at time t equals

(75) Pt(θ)=θSˆt+α1(1θ)(tSˆt)+β101ySˆt+α1(1y)(tSˆt)+β1dy,θ[0,1].(75)

Defining τ as the first tN such that 01θPt(θ)dθ<c/(c+r) (possibly ), we have for all t<τ that the estimated ϑ is given by

(76) ϑˆt=α+Sˆtα+β+t.(76)

Then there are two possible cases: τ< and τ=. In the first case (where τ<) the agent stops placing trust and by the assumption on ϑ, namely that ϑc/(c+r)=θcrit, it is clear that the agent has not converged to the true value of ϑ. This is because of the fact that quitting requires ϑˆθcrit and that by assumption we have chosen ϑ>θcrit. In the second case we have τ=. Here we define St which drops the dependence on the actions of the agent:

(77) St=s=1tXt,∀tN,(77)

and we note that when τ= the relationship Sˆt=St holds, and therewith ϑˆ=(St+α)/(α+β+t). First inspecting St/t which we know by the law of large numbers takes

(78) limtStt=ϑ(almostsurely).(78)

Subsequently investigating ϑˆt as t we see:

(79) limtϑˆt=limtSt+αt+α+β=ϑ(almostsurely).(79)

This shows that the agent’s estimate has indeed converged to the true ϑ almost surely.

A.3. Proof of Lemma 3 (Quitting probability when c=1, and rN)

Proof. Recall the assumption that θc/(c+r) and the definitions π(u)=P(t:Ztu) and ρ=P(t:Zt1). Observe that (A) by the strong Markov property the random walk after having hit level 1 is independent from what happened before, (B) on the path to level u any level in the set 1,,u1 has been attained. We thus obtain the `memoryless property’

(80) π(u)=ρu.(80)

Now consider the first step of the random walk starting at 0; it can either hit the first level immediately at probability (1θ) or it drops to the level r putting it a distance r+1 from the level 1. Using (80) we find the identity

(81) F(ρ):=ρ=(1ϑ)+ϑρr+1=:G(ρ).(81)

We can prove that this equation has exactly one solution in the interval [0,1). Note that F() is convex. A trivial solution is of course ρ=1 which accounts for one of at most two intersections between a convex function and a linear function. It thus remains to show that there is another intersection, and that it lies in [0,1). This is indeed the case because F (1)=θ(r+1), which is larger than G (1)=1 due to the condition that θ>c/(c+r) after substituting c=1 and rearranging. Observing that F(0)=1θ>0=G(0), this entails that there is another root located somewhere between ρ=0 and ρ=1.

A.4. Proof of Lemma 4 (Expected time to quitting when c=1, and rN)

Proof. Let τ(u)=inft:Zt=u be the first passage time for the walk Zt to u and recall that π(u) is the probability of this event occurring in finite time (i.e., π(u):=P(t:Ztu)). Then we are interested in the expectation of the τ(u) conditioned on its finity:

(82) E[τ(u)|τ(u)<]=E[τ(u)1{τ(u)<}]π(u).(82)

Note the relationship

(83) E[zτ(u)1{τ(u)<}]=E[zτ(1)1{τ(1)<}]u(83)

which holds by the precise same `memoryless argumentation’ as in the proof of Lemma 3. We define

(84) φ(z):=E[zτ(1)1{τ(1)<}],(84)

which by the relationship (83) also allows for a self-referential expression

(85) φ(z)=z(1ϑ)+φ(z)r+1.(85)

Finally note that we can get the expectation of interest out again by taking the derivative and substituting z=1:

(86) E[τ(u)|τ(u)<]=uφ (1)π(1).(86)

We now manipulate these expression in order to get the final statement: observing that

(87) φ (z)=(1ϑ)+ϑφ(z)r+1+(r+1)φ(z)rφ (z),(87)

substituting z=1 and noting that φ(1)=π(1)=ρ with ρ as defined in (22), we obtain

(88) φ (1)=(1ϑ)+ϑρr+1=ρby(24)+ϑ(r+1)ρrφ (1)=ρ1(r+1)ϑρr=1(1ϑ)ρ1=ρ2(1ϑ)(r+1)(multiplyingbyρρ).(88)

Multiplying this by u/ρ gives the final result. □

A.5. Proof of Lemma 5 (Quitting probability when r=1, and cN)

Proof. We introduce the time G which follows a shifted geometric distribution: P(G=t)=(1f)tf for some f(0,1). We investigate the behaviour of the walk, specifically the probability of reaching u, before the geometrically triggered time G:

(89) p[f](u):=P(nG:Znu),(89)

as well as the probability of reaching Zn=1 for some nG:

(90) ρ[f]:=P(nG:Zn1).(90)

Furthermore, we define minimum of the walk until time n, Zn0.3em:=minZ0,Z1,,Zn as well as the maximum attained until time n, Zˉn:=maxZ0,Z1,,Zn. Due to the Wiener-Hopf factorisation [30, 31], ZˉGZG is independent of ZˉG, so that EwZG=EwZˉGEwZGZˉG. It is also true that ZGZˉG is distributed as ZG0.3em, as is seen by rotating the walk 180 and considering ZG as the starting point of the new, identically distributed process. Therefore,

(91) wZG=wZˉGwZ0.3emG.(91)

Hence, if we manage to identify EwZG and wZ0.3emG, we have found EwZˉG. In pursuit of EwZG, let x be the number of times the process jumps upward in the time interval t1,,G. Hence,

(92) EwZG=Ex=0GGx(1ϑ)xϑGxwcx(Gx)=Ex=0GGx((1ϑ)wc)xϑwGx=E(1ϑ)wc+ϑw1G.(92)

Recalling that G is geometrically distributed, for (1f)((1θ)wc+θw1)<1,

(93) EwZG=i=0(1f)if((1ϑ)wc+ϑw1)i=f1(1f)((1ϑ)wc+ϑw1).(93)

In order to find wZ0.3emG we note that the minimum ZG0.3em is identically distributed to the negative of the maximum of the random walk BG that takes steps up of size one with probability ϑ and down cN with probability 1ϑ. These two walks and their step descriptions are shown in .

Table 4. Two random walks.

The walk BG is simply a variable substitution of the walk analysed in in §3.2.1 at an added shifted geometric distributed time step G and as such the analysis is remarkably similar. Define the probability of the walk B going up by at least u before the geometric trigger,

p[f](u):=P(nG:Bnu).

Similarly define the probability of reaching u=1 before the geometric trigger by

ρ[f]:=P(nG:Bn1),

then as before we have the memoryless property p[f](u)=(ρ[f])u. It is also apparent that ρ[f] solves

(94) ρ[f]=(1f)(ϑ+(1ϑ)(ρ[f])c+1).(94)

By defining s[f](u):=P(BˉG=u) as the probability that the maximum of the random walk B by time G is exactly u, and noting that

(95) s[f](u)=p[f](u)p[f](u+1)=(ρ[f])u(ρ[f])u+1=(ρ[f])u(1ρ[f]),(95)

we have an expression for this maximum. By the relationship between BG and ZG, we have also have an expression for ZG0.3em, with ρ[f] solving (94),

(ZG0.3em=u)=(ρ[f])u(1ρ[f]).

Hence,

(96) wZ0.3emG=u=0(ρ[f])u(1ρ[f])wu=1ρ[f]1ρ[f]w1.(96)

This, together with expression (93) and the Wiener-Hopf factorisation (91), yields

(97) EwZˉG=f1(1f)((1ϑ)wc+ϑw1)×1ρ[f]w11ρ[f].(97)

By considering this as f0, it is first noted that ρ[0]=1. Then, defining σ(u):=P(Zˉ=u),

(98) ζ(w):=u=0σ(u)wu=limf0EwZˉG=limf0f1(1f)((1ϑ)wc+ϑw1)×1ρ[f]w11ρ[f]=limf01ρ[f]w11(1f)((1ϑ)wc+ϑw1)×f1ρ[f]=1w11((1ϑ)wc+ϑw1)×limf0f1ρ[f](98)

Applying L’Hôpital’s rule, we obtain

(99) ζ(w)=1w11((1ϑ)wc+ϑw1)×1γ,withγ:=limf0ddfρ[f].(99)

It is easy to verify, by differentiating both sides of (94) with respect to f and solving for γ, that 1/γ=(c+1)(1ϑ)1, which entails that

(100) ζ(w)=(1w1)((c+1)(1ϑ)1)((1ϑ)wc+ϑw1)1=(w1)(1(1ϑ)(c+1))wϑ(1ϑ)wc+1.(100)

But we are interested in π(u), as such let ξ(w)=u=0π(u)wu, which is expressed using σ(u) in the following way

(101) ξ(w)=u=0v=uσ(v)wu=v=0u=0vσ(v)wu=v=0σ(v)1wv+11w=v=0σ(v)wv=0σ(v)wv1w,(101)

culminating in the useful expression

(102) ξ(w)=1(w)1w,(102)

by which we can evaluate π(u) via the relationship

(103) π(u)=1ududwuξ(w)|w=0=ξ(u)(0)u!.(103)

Substituting the expression for ζ(w) gives the result. □

Remark 7. Cong and Sato [32] use combinatorial arguments to find the probability of first hitting the absorbing barrier at specific times t for the cases when either r or c is one,

(104) pt(u)=P(infnN:Znu=t).(104)

Taking the sum (over the first hitting times t) could in turn provide the probabilities that we state here: t=1pt(u)=π(u). Notice that this approach is of an intrinsically approximative nature, as one has to somehow truncate the summation over t, while our approach is exact. We do not make use of their result, also because the underlying computations are involved, and in addition it does not yield the expected time of quitting.

A.6. Proof of Lemma 6 (Expected time to quitting when r=1, and cN)

Proof. Define

(105) φ(z,u):=E[zτ(u)1{τ(u)<}],Φ(w,z):=u=1wuφ(z,u),w(1,1),(105)

so that

(106) E[τ(u)|τ(u)<]=φ (1,u)φ(1,u),φ(z,u)=uwuΦ(w,z)u!w=0(106)

The dynamics of the process reveal the recursive relation

(107) φ(z,u)=(1ϑ)zφ(z,uc)+ϑzφ(z,u+1)ifu>c,(1ϑ)z+ϑzφ(z,u+1)otherwise.(107)

Substituting this into the definition of Φ(w,z), we obtain, after some minor rewriting,

(108) Φ(w,z)=u=1cwu(1ϑ)z(a)+u=1wuϑzφ(z,u+1)(b)+u=c+1wu(1ϑ)zφ(z,uc)(c).(108)

Inspecting (a), (b), and (c) individually, we obtain

(109) (a)=(1ϑ)zu=1cwu=(1ϑ)zwwc+11w(109)

by applying the finite geometric series,

(110) (b)=ϑz1wu=1wu+1φ(z,u+1)=ϑzwΦ(w,z)ϑzφ(z,1)(110)

relying on the definition of Φ(w,z), and

(111) (c)=wc(1ϑ)zu=c+1wucφ(z,uc)=wc(1ϑ)zΦ(w,z)(111)

shifting the summation index by c+1 and again applying the definition of Φ(w,z). By combining and rearranging (a), (b), and (c), we can isolate Φ(w,z), so as to obtain

(112) Φ(w,z)=(1ϑ)zwwc+11wϑzφ(z,1)1(ϑz)/wwcz(1ϑ).(112)

What remains is to identify φ(z,1) for each z(1,1). The key principle is that, because their ratio is a finite number, any root of the denominator in (112) must also be a root of the numerator. Indeed, supposing there is a unique w(z)(1,1) which yields a root of the denominator, this would imply that

(113) φ(z,1)=1ϑϑzw(z)w(z)c+11w(z)=(1ϑ)zw(z)+ϑzw(z)1w(z).(113)

This means that we are done if we can show that for any z(1,1) the denominator in (112) has a unique root w(z) in (1,1). It is directly seen that this root satisfies

(114) F(w):=wc+1z(1ϑ)+=w=:G(w).(114)

Observe that for z=0 we obviously have the unique root w=0.

Subcase z>0. Observe that F(0)=>0=G(0) and F(1)=z<1=G(1), so that F(w)G(w) changes sign an odd number of times for w(0,1). But because F() is convex and G() is linear, the function F()G() changes sign at most twice. By combining the above, we conclude the existence of the unique root w(z)(1,1) (which actually lies between 0 and 1).

Subcase z<0. The case of c odd works analogously to the subcase z>0: F(0)=<0=G(0) and F(1)=z>1=G(1). From the concavity of F(), the existence of a unique root w(z)(1,1) follows (which lies between 1 and 0). In case c is even, we still have F(0)=<0=G(0), but now it should be noted that F(1)=z(2ϑ1), which is, for z(1,0) and ϑ(0,1), larger than 1=G(1) (pick ϑ=1 and z=1). The existence of a unique root w(z)(1,1) now follows from the fact that F(w) is decreasing and G(w) is increasing for w(1,0) (where this root lies between 1 and 0). □

B.

Appendix: Single agent case r,cN

In this case no exact formula can be obtained for the probability of quitting or the expected time to quitting conditioned thereon; recall that in the other two cases we exploited the fact that the walk, either in the upward or the downward direction, had step size 1. We briefly sketch a numerical scheme which allows accurate approximation of the probability of quitting. The relationship for all u>c is:

(115) π(u)=ϑπ(u+r)+(1ϑ)π(uc).(115)

We supplement this with an approximation for large enough u>u0 by setting πˆ(u)=ηγu for η0 and γ(0,1). In combination with (115) we know that γ satisfies 1=(1ϑ)γc+ϑγr. Using this approximation for π(u) from uu0 one has:

(116) πˆ(u)=ηγuifuu0(1ϑ)πˆ(uc)+ϑπˆ(u+r)ifcuu0(1ϑ)+ϑπˆ(u+r)f0uc,(116)

which can be used to construct a system of equations which can be solved numerically. We propose approximating the probability of absorption (and subsequently a termination of trust) by the solution to the system (116) at the appropriate u=+1. The value of u0 should be large enough value to get a good approximation, and small enough to keep its computation manageable.

We sketch a similar approximation scheme for the expected time in the system. The crux of the scheme is to approximate the expected time to quitting at a high enough u by the number of rounds it would take to move directly toward absorption without taking any steps away: aaaaa. In terms of φ(z,u) one arrives at the following system of equations:

(117) φˆ(z,u)=(1ϑ)[u/c]z[u/c]ifuu0(1ϑ)(z,uc)+ϑzφ(z,u+r)ifc<u<u0(1ϑ)z+ϑzφ(z,u+r)if0<uc.(117)

This defines a system of equations similar to (116) which can be solved numerically. This provides an approximation to φ(z,ucrit) which in turn can be used to get an approximation of E[τ(ucrit)|τ(ucrit)<] by taking the derivative at ucrit: φ (1,ucrit)/φ(1,ucrit). Notice the similarity to (40).

C.

Appendix: An illustration of how the OA model can beat the OR model

In this appendix we elaborate on the peculiar outcome that in the parameter setting with c=3, r=2, and u=2 the OA model yields a lower probability of quitting than the OR model at the trustworthiness settings ϑ=0.65,0.66.

We consider a pair of agents in precisely the setting in with the exception (c=3, r=2, u=2 and ϑ=0.65). Furthermore, we presume that of the first four times trust is placed in the institution (by either agent) trust is honoured only once.

In the OR model, this means that maximally two rounds of interaction can take place. Either the first round includes abuses of trust to both agents, in which case they quit in round 1, or trust is honoured once in round 1, but abused twice in round 2 and so the agents do not place trust in round 3.

In the OA model, the agents involved have two rounds of interaction regardless of their first reward. The act of placing trust at the start of round 2 yields no information to the neighbour. At the start of round 3 however, the agent (labelled 1) who observed two abuses does not place trust. This signals to the neighbouring agent (labelled 2) that two abuses of trust have certainly occurred. At the point of placing trust in round 3 (before observing the outcome thereof) agent 2 has access to information which brings their estimate below the critical value. However, the agent has already placed their trust and so will see the outcome of that interaction. This outcome is of crucial importance because if the trust is honoured (which is more likely as ϑ=0.65>0.5), agent 2 will stay in the relationship while if the trust is abused they quit.

D.

Appendix: Tables of simulated results

In the tables below we present the probability of quitting and the expected time to quitting under the three considered mechanisms.

Table 5. Probability of quitting and the expected time to quitting in the single agent model (Sol), the observable actions model (OA) and the observable rewards model (OR). Parameters α and β set such that u=1

Table 6. Probability of quitting and the expected time to quitting in the single agent model (Sol), the observable actions model (OA) and the observable rewards model (OR). Parameters α and β set such that u=2

Table 7. Probability of quitting and expected time to quit in the single agent model (Sol), the observable actions model (OA) and the observable rewards model (OR). Parameters α and β set such that u=3