490
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Geographic proximity and homophily effects drive social interactions within VGI communities: an example of iNaturalist

ORCID Icon, ORCID Icon & ORCID Icon
Article: 2297948 | Received 04 Aug 2023, Accepted 17 Dec 2023, Published online: 26 Dec 2023

ABSTRACT

Social interactions among online community members contributing to volunteered geographic information (VGI) are a key element and often crucial to VGI project success. Existing studies on VGI lack investigations into the patterns of social interactions within VGI communities and the drivers that may have shaped these patterns. This study bridges this gap by analyzing social interaction patterns in the iNaturalist citizen science project and exploring potential driving factors through social network analysis. The relationships between potential drivers of social interactions in iNaturalist (i.e. geographic distance, species taxon composition similarity, and land cover composition similarity) and the establishment, intensity, and clustering of interactions were examined. The results revealed that geographic proximity, common interests in species taxon categories, and shared preferences in observation environments are major drivers of inter-contributor species identification interactions in iNaturalist. These findings are supported by theories that explain the social forces behind social link formation. Geographic distance represents physical proximity, whereas species taxon composition similarity and land cover similarity reflect homophily effects. iNaturalist and many other VGI communities are spatially embedded social networks. The discovered interaction drivers in iNaturalist generally align with those in spatial social networks and are expected to be generalizable to VGI communities beyond iNaturalist.

This article is part of the following collections:
Advances in Volunteered Geographic Information (VGI) and Citizen Sensing

1. Introduction

Recent decades have witnessed the rise of volunteered geographic information (VGI) (Goodchild Citation2007) to be a significant phenomenon in GIScience and beyond as it offers unprecedented opportunities for sensing the physical and social environments (Connors, Lei, and Kelly Citation2012; Elwood, Goodchild, and Sui Citation2012; Liu et al. Citation2015). Citizen volunteers are essential for VGI because they act as a community of human sensors on the ground to contribute spatially and temporally referenced observations of a variety of geographic features and phenomena (Mordechai Haklay and Weber Citation2008; Yan et al. Citation2020; G. Zhang et al. Citation2018). Typically, there are four components of a VGI record: ‘Who’ reports ‘what’ at ‘where’ and ‘when’ (G. Wang and Ye Citation2018; Zhang and Zhu Citation2018). The ‘who’ component represents the contributing volunteer, the subject who observes a target phenomenon (‘what’) at a chosen geographic location (‘where’) and a selected time point or period (‘when’). Volunteers are situated at the core of VGI because maintaining a critical mass of volunteers to contribute data is the key to sustaining any VGI project (Bégin, Devillers, and Roche Citation2018). Also, VGI data quality is impacted very much by the characteristics of VGI contributors (e.g. level of expertise) (Malik et al. Citation2015) and the spatiotemporal pattern of their data contribution behavior (G. Zhang Citation2020).

Social interaction among volunteers, whatever forms it takes to manifest itself in the context of a particular VGI community, is even more crucial to VGI and its data quality (Liberatore et al. Citation2018). In many cases, social interactions between contributors as collaborative processes are intrinsic to VGI data production. For example, participants in the iNaturalist citizen science project suggest species identifications or vote on existing ones for observations submitted by other contributors to reach consensus species identification (Unger et al. Citation2021). VGI also depends on a social network of contributors who offer a sufficient number of ‘eyes’ to converge on the truth of the geographic features or phenomena under observation (McGough, Kavak, and Mahabir Citation2022; Mordechai Haklay et al. Citation2010), as is evident in OpenStreetMap (OSM) where contributors co-edit spatial entities to improve the mapping accuracy (Sarkar and Anderson Citation2022). Furthermore, VGI projects may rely on a hierarchy of trusted individuals in the social network acting as gate-keepers to assure VGI data quality (Goodchild and Li Citation2012). For instance, a network of regional experts in eBird helps vet dubious birding records submitted by bird watchers (Kelling et al. Citation2013).

Social interaction is crucial also because critical information diffuses via social interactions within the VGI communities. For example, when social media VGI is harnessed and utilized to support disaster and emergency response (Haworth and Bruce Citation2015), time-sensitive information on those needing rescue or aid and those providing assistance relies on social interactions embedded in the social media network to spread (Landwehr and Carley Citation2014). Therefore, it is of paramount importance to understand the social interaction patterns and the pattern-shaping drivers underpinning VGI communities. Such knowledge would not only advance VGI research by deepening our understanding of how VGI and its data quality may be linked to social interactions among volunteer contributors (Sarkar and Anderson Citation2022), but also inform the design of effective strategies to engage and sustain participants in VGI communities (Sbrocchi et al. Citation2022).

Studies involving social media VGI may purposefully analyze the underlying social network to examine interactions among social media users, especially in disaster-related social media VGI applications (Feng, Huang, and Sester Citation2022; Wang and Ye Citation2018). In disaster situations such as floods and hurricanes, online social network platforms are crucial channels for exchanging timely information on official announcements, situational updates, damage assessments, aid needs, etc. among impacted communities, authorities, and news media. Social network analysis was conducted to reveal social interaction characteristics and patterns in the network (Kim and Hastak Citation2018), which facilitates understanding of how affected community members use social media to call for help (Li et al. Citation2019) and how information spreads through the network (J. Xu and Qiang Citation2022). Such insights gained through social network analysis can be helpful, for example, for authorities to design communication strategies tailored to the characteristics of the network to effectively outspread disaster information effectively (Cheong and Cheong Citation2011).

In general, social media networks mirror the patterns of interpersonal communication in the real world, as online and real-world interpersonal communication are subject to similar constraints (e.g. economic, cultural, social, and linguistic) (Stephens and Poorthuis Citation2015). Such constraints are largely shaped by geography in the real world, and as a result, the well-known geographic distance decay effect may be observed in social media networks (Han, Tsou, and Clarke Citation2018). For example, interactions on Twitter among nonprofits in the United States during the COVID-19 pandemic are more likely to occur within the same city or in adjacent states (Gong et al. Citation2022). However, closer geographical proximity does not always imply stronger social connections. For instance, the connections among Twitter accounts of U.S. state governors are shaped largely by political affiliation, not geographic proximity (Gong and Ye Citation2021); intra-urban social media check-in activities are found to exhibit higher predictability by social connections compared to using geographic distances (Zhu et al. Citation2020).

Nonetheless, social media VGI is just one of the many sources and forms of VGI (e.g. citizen science, crowdsourcing, participatory mapping, neogeography, and public participation GIS) (G. Zhang Citation2021). Existing studies on non-social media VGI have a limited focus on the social interactions among VGI community members, with a few exceptions. OSM is a predominant VGI project in which any OSM contributor can edit an existing spatial entity (e.g. house, road, or lake) in the database previously created (or edited) by other contributors by modifying the geometry and/or adding or changing tags to create a new version of the entity to improve mapping accuracy (Mordechai Haklay and Weber Citation2008). Co-editing is a major form of inter-contributor social interaction in the OSM mapping community. Mooney and Corcoran (Citation2014) analyzed interaction and co-editing patterns among OSM contributors and found that ‘senior mappers,’ who are frequent contributors making a vast majority of contributions on their own, as well as co-editing entities created by lower frequency contributors. Zhang et al. (Citation2021) adopted a variant of the PageRank algorithm to rank OSM contributor reputation based on the evaluation relationships embedded in the co-editing network. Sarkar and Anderson (Citation2022) investigated co-editing patterns among corporate editors (CEs; teams of mappers enlisted by corporations) and non-CEs in OSM and discovered that significant co-editing exists between CEs and non-CEs, although CEs tend to have more in-group co-editing activities.

Geographic citizen science is another major source of VGI (Muki Haklay Citation2021; G. Zhang Citation2021). However, few investigations have been conducted to reveal the patterns and/or drivers of social interactions within citizen science communities, although the importance of such social interactions is well acknowledged (Sullivan et al. Citation2009; Torres et al. Citation2022). Liberatore et al. (Citation2018) presented experiences of setting up and administering a social group on social media platforms to facilitate social interactions among participants of the New Zealand Garden Bird Survey citizen science project (e.g. posts, comments, likes, and shares) but did not perform an in-depth analysis of the patterns and drivers of the interactions. Sbrocchi et al. (Citation2022) conducted an analysis of a network of Australian citizen science practitioners where social interactions were established through co-participation in local or national meetings. They found that the interactions were influenced by disciplinary backgrounds, level of experience, gender, and age group, but not by geographic boundaries.

In summary, although social interactions among VGI contributors underpin various VGI communities and are often key to the success of VGI projects, current VGI research generally lacks investigation into social interactions within VGI communities. Amongst the scarcity of studies examining a few types of inter-contributor social interactions (i.e. co-editing activities in OSM, interactions on social network platforms, and interactions among citizen science practitioners), some have discovered interesting social interaction patterns within the respective VGI communities. Nevertheless, they are mostly lacking a closer examination of the spatial and/or non-spatial drivers of the interaction patterns.

To bridge this research gap, this study analyzes the patterns and drivers of social interactions in a very large VGI community, using the iNaturalist global-scale biodiversity citizen science project as an example. Social interactions in iNaturalist are reflected by inter-contributor species identification activities between ‘observers’ who submit species photos and ‘identifiers’ who subsequently identify the species in the photo. Each contributor can be associated with a geographic location (or region) where the contributor contributes data most actively. Therefore, the inter-contributor species identification interactions among iNaturalist contributors were modeled as a spatial social network (Expert et al. Citation2011; Tupikina et al. Citation2021). Social network analysis methods were then used to elucidate the structural characteristics and interaction patterns of the iNaturalist network.

This work also investigates the drivers behind the interaction patterns based on a theoretical framework concerning the key social forces driving the formation of social links in a social network (McCulloh, Armstrong, and Johnson Citation2013). According to the framework, social forces driving interactions in a social network include homophily, reciprocity, proximity, prestige, transitivity, etc. Among the driving forces, homophily represents the tendency that social actors who share common interests, beliefs, goals, race, gender, and culture are more likely to establish social interactions. Reciprocity indicates that social actors tend to form direct interactions with others who initiate interactions with them. Proximity refers to the organizational or physical distance between social actors. The closer the actors are to each other, the more likely they are to interact and form relationships. Prestige reflects the tendency of important actors to have high prestige and hold great influence over other members in the network, who in turn tend to interact more with high prestige actors. Transitivity measures the tendency of social connections to form through a common connection with a third party. This study, following the above theoretical framework, aims to examine the effects of the social forces in driving social interactions in the iNaturalist community, and identify the major spatial and/or non-spatial drivers of interactions in the iNaturalist spatial social network.

The rest of the article is organized as follows: Sections 2 present the datasets; Section 3 provides an overview and details of the data analysis methods; Sections 4 and 5 report the results and discussion, respectively; and Section 6 concludes the paper.

2. Data

iNaturalist species observation datasets were used in this study as examples of VGI data for examining social interactions in VGI communities. iNaturalist was initiated in 2008 and is now one of the world’s largest biodiversity-themed citizen science projects. It is a platform for biologists, naturalists, and citizen scientists to contribute, share, and identify species observations worldwide (Unger et al. Citation2021). As of March 2023, iNaturalist has accumulated over 128 million observations on more than 418,600 species with records contributed by over 2.5 million observers and over 291,800 identifiers (iNaturalist Citation2022b).

An iNaturalist species observation has a unique record id and essentially keeps track of ‘who’ (user id, observer login) observed ‘what’ species (species name and taxonomic information, if identified) at ‘where’ (latitude and longitude of observation location) and ‘when’ (date and time of observation) and, optionally, identified by ‘whom’ (identifier login) at ‘when’ (date and time of identification). At the contributor’s discretion to protect geoprivacy, geographic coordinates can be marked private (i.e. invisible to others) or obscured (i.e. the true observation location is replaced with a location randomly selected from a surrounding 0.2 × 0.2 degree cell area) (iNaturalist Citation2022a). When submitting a species observation record with photos, one can optionally select an identification for the species from the list of potential matching species provided by the iNaturalist. Other iNaturalist community members can review and vote on species identification or propose new identifications if they do not agree with existing ones or if an initial identification is missing.

A species observation record with an identification that the community agreed upon and met additional quality control requirements is labeled as ‘research-grade’ (iNaturalist Citation2023). Information on the identifier (i.e. identifier login) and identification time associated with the first identification attempt that the proposed community-accepted identification is made available for ‘research-grade’ observations only. Although social interactions in iNaturalist can also take other forms (e.g. following, commenting, and voting), this study is primarily concerned with inter-contributor species identification activities that are captured by information embedded in ‘research-grade’ observations. ‘Research-grade’ observations therefore are crucial for reconstructing such interactions in iNaturalist.

‘Research-grade’ observations were downloaded from the Global Biodiversity Information Facility (GBIF) (Ueda Citation2022) (n = 54,913,726 for the full dataset as of 31 December 2022). Raw observations were also obtained from the iNaturalist website (iNaturalist Citation2022c) (n = 138,914,933 for the full dataset as of 31 December 2022). The raw dataset shall contain all ‘research-grade’ observations, as well as unidentified observations and observations identified but do not qualify as ‘research-grade’. It is necessary to derive contributor attributes, such as the approximate geographic location of a contributor and the taxon composition of species observed or identified by a contributor (see Section 3).

In addition, the land cover type at each observation location was extracted from the 500-meter resolution yearly (2001–2019) MODIS global land cover type dataset (MCD12Q1 Version 6) (Sulla-Menashe and Friedl Citation2018). This openly available dataset is maintained, updated, and published by the U.S. Geological Survey and the National Aeronautics and Space Administration. Observation locations prior to 2001 were assigned land cover types based on the 2001 land cover data, while locations after 2019 were assigned land cover types based on the 2019 land cover data. Land cover information was used to derive the land cover composition of a contributor’s observations and/or identifications (Section 3). All of the above data were loaded into a PostgreSQL/PostGIS spatial database for further query and analysis.

This study focuses on species identification interactions in iNaturalist in the most recent year of 2022. Accordingly, all ‘research-grade’ observations identified in 2022 (n = 15,955,457) were used in this study, regardless of when such observations were made. Raw observations made in 2022 (n = 36,327,481) were also included, as they were needed, for example, to estimate the contributor’s approximate geographic location in the year (Section 3.2.2.1).

3. Methods

3.1. Methodological overview

This study aimed to reveal the patterns of inter-contributor species identification interactions within the iNaturalist community and explore the drivers that may have shaped such interaction patterns (intra-contributor species identification activities where contributors identify their own species observations were not considered in this study). iNaturalist contributors can play different roles in the community: ‘pure observers’ who had only submitted species observations, ‘pure identifiers’ who had only identified observations submitted by others, or ‘mixed observer-identifiers’ who had submitted observations themselves as well as identified others’ observations. The interactions among contributors are best modeled as a social network (McCulloh, Armstrong, and Johnson Citation2013) (Section 3.2), which can then be analyzed to uncover social interaction patterns in the network (Section 3.3).

As for examining what factors may be driving the interaction patterns, according to the theoretical framework (see Introduction), social actors with similar interests and preferences (homophily) and in closer organizational or physical distance (proximity) tend to get more associated with each other in a social network and even form communities or clusters in the network (Bedi and Sharma Citation2016; McCulloh, Armstrong, and Johnson Citation2013). In the context of the iNaturalist community, it was hypothesized that geographic proximity, shared interests in species taxon categories, or shared preferences in observation environments may facilitate species identification interactions among community members. Therefore, metrics reflecting potential interaction-influencing factors (i.e. geographic distance, species taxon composition similarity, and land cover composition similarity between contributors) were derived to characterize the contributors and interactions (Sections 3.2.2 and 3.2.3). This allowed an empirical examination of the influences of these factors on social link development, interaction strength, and community formation in the network to identify the major drivers of social interactions in the iNaturalist community. An overview of the data analysis workflow can be found in and details of the data analysis methods are presented in the following two sections.

Figure 1. An overview of the data analysis workflow.

Figure 1. An overview of the data analysis workflow.

3.2. Social network construction

3.2.1. The social network

The inter-contributor species identification interactions in iNaturalist, as reflected by information embedded in the ‘research-grade’ observations, are modeled as a social network using a directed graph (McCulloh, Armstrong, and Johnson Citation2013). In the graph, the nodes are iNaturalist contributors. A directed edge eij from node vi (observer) to another node vj (identifier) exists if vj has identified any observations made by vi. The strength of the edge is measured by weight wij, that is, the number of observations identified by vj that are made by vi. There are no self-loops in the network (i.e. ij for any eij), as intra-contributor species identification activities were not considered in this study.

This network captures interactions involving the identification of at least one species observation between two contributors (i.e. wij ≥ 1 for any i and j where ij). It is a well-known phenomenon that in an online VGI community, the vast majority of members often make only a very small number of contributions because they tend to be the one-off type who make ‘ad-hoc’ contributions (e.g. submitting only a few observations or identifications) and then leave and never return (Mordechai Haklay Citation2016; G. Zhang Citation2020). As a result, species identification interactions involving such contributors are not necessarily meaningful. To reduce such ‘noise’ in the subsequent network analysis, an edge weight threshold was applied to simplify the network. Only nodes connected through edges with weights of no less than the threshold were retained. After exploratory analyses, the weight threshold was set to five, as it appears to be able to differentiate consistent and meaningful species identification interactions among iNaturalist contributors from ad hoc ones.

3.2.2. Node attributes

In addition to the basic information on an iNaturalist contributor (e.g. user ID and login), three attributes were derived to characterize each contributor (node) in the network: approximate geographic location, species taxon composition, and land cover composition. The location of an iNaturalist contributor is important for situating the social network in a spatial context, while species taxon composition and land cover composition are indicative of contributor preferences for the species to observe (e.g. birds vs. plants) and the observation environment in which observations are conducted (e.g. urban built-up lands vs. woody savannas).

3.2.2.1. Geographic location

The geometric median of all observation point locations of a contributor in a given time period (e.g. in 2022, according to the raw observation dataset) was adopted to represent the approximate geographic location of that contributor. The geometric median is the point location that minimizes the sum of the distances to the observation locations, providing a centrality measure that is robust to outlier points (PostGIS Citation2023). The median location provides an estimate of where the contributor was situated geographically at a coarse spatial granularity (e.g. city, county, or state); it is not intended to reveal the contributor’s location (e.g. home location). The geographic coordinates (latitude and longitude) of the geometric median point were attached as attributes to the corresponding node in the network.

3.2.2.2. Species taxon composition

Each contributor may be associated with two sets of species observations: observations made by the contributor (as an observer) and observations made by others, but identified by the contributor (as an identifier). Accordingly, two-species taxon compositions were derived for each contributor: observer species taxon composition (null for ‘pure identifiers’) and identifier species taxon composition (null for ‘pure observers’).

Given species observations made by the contributor (according to the raw observations dataset), the observer species taxon composition was measured by the relative frequency distribution of observations over species taxon categories. Similarly, identifier species taxon composition is represented as the relative frequency distribution of observations made by others but identified by the contributor (according to the ‘research-grade’ observation dataset) over the taxon categories. Species taxon categories were derived from taxon names in the ‘iconic_taxon_name’ field from the raw observation dataset (). Some taxon names (i.e. Archaea, Bacteria, Chromista, Protozoa, and Viruses) were excluded when computing taxon compositions because few observations were in these taxon categories. Observer and identifier species taxon compositions, each as a vector of relative frequencies, were attached as attributes to the corresponding node in the network.

Table 1. Species taxon categories appearing in the ‘iconic_taxon_name’ field from the raw observations dataset.

3.2.2.3. Land cover composition

Analogously, two land cover compositions were derived for each contributor: observer land cover composition (null for ‘pure identifiers’) and identifier land cover composition (null for ‘pure observers’). Observer land cover composition is represented as the relative frequency distribution of observations (according to the raw observation dataset) over land cover types (). The identifier land cover composition is the relative frequency distribution of observations made by others but identified by the contributor (according to the ‘research-grade’ observation dataset). Observer and identifier land-cover compositions, each as a vector of relative frequencies, were attached as attributes to the corresponding node in the network.

Table 2. Land cover types used in computing land cover compositions.

3.2.3. Edge attributes

Based on the above node attributes, metrics can be derived to describe edge characteristics in addition to edge weight (i.e. the number of species identifications between an observer and an identifier), including geographic distance, species taxon composition similarity, and land cover composition similarity between the two contributors.

3.2.3.1. Geographic distance

Geographic distance was derived as a measure of the spatial proximity between contributors with species identification interactions. For edge eij from node vi (observer) to node vj (identifier), the great circle distance d between vi and vj was computed based on the geographic coordinates of the two interacting contributors (EquationEquation 1), and set as an attribute for edge eij. (1) d(vi,vj)=2R×arcsin([sin(latjlati2)]2+cos(lati)×cos(latj)×[sin(lonjloni2)]2)(1) where R is the equatorial radius (6378 km), and (lati, loni) and (latj, lonj) are the latitude and longitude of nodes vi and vj, respectively.

3.2.3.2. Species taxon composition similarity

Species taxon composition similarity was computed to measure the extent to which interacting contributors shared preferences for the observed species. For edge eij from vi (observer) to vj (identifier), the similarity between vi’s observer species taxon composition (i.e. a vector of relative frequency values) and vj’s identifier species taxon composition was computed as the cosine similarity between the two species taxon composition vectors (EquationEquation 2): (2) Cosine_similarity(A,B)=ABAB=k=1nAkBkk=1nAk2k=1nBk2(2) where A and B are two n-dimensional vectors and Ak and Bk denote the values of the kth dimension in the two vectors, respectively. The cosine similarity ranged from 0 to 1, with larger values indicating higher similarities. The computed species taxon composition similarity was set as an attribute of edge eij.

3.2.3.3. Land cover composition similarity

Similarly, the similarity between the land cover compositions of interacting contributors was computed to measure the extent to which they shared preferences in the observation environment. For edge eij from vi (observer) to vj (identifier), the similarity between vi’s observer land-cover composition and vj’s identifier land-cover composition was also computed as the cosine similarity between two land-cover composition vectors according to EquationEquation 2. The computed land cover composition similarity was set as an attribute for edge eij.

3.3. Social network analysis

The network was implemented and analyzed using NetworkX (version 3.0) (Hagberg, Swart, and S Chult Citation2008) and RAPIDS cuGraph (version 22.12) (Fender, Rees, and Eaton Citation2022) Python libraries. cuGraph is well suited for analyzing large-scale networks (e.g. with millions of nodes and edges) because it can utilize parallel computing on graphics processing units (GPUs) to accelerate graph analytics algorithms.

3.3.1. Basic structural analysis

The basic structural characteristics of the network were examined using the social network analysis algorithms implemented in NetworkX and cuGraph. Specifically, density, average clustering coefficient, distribution of node degrees, and connected component analysis were conducted to reveal the network-level clustering and connectedness characteristics of the network (Brandes and Erlebach Citation2005).

3.3.2. Impacts on edge strength

A correlation analysis was conducted between edge weight and edge attributes (i.e. geographic distance, species taxon composition similarity, and land cover composition similarity) to examine whether the attributes have any influence on edge strength. Considering that there might be tens of thousands of edges, the edges were first binned by a series of consecutive equal intervals on each attribute. The average weight and average attribute value of the edges falling into each interval (i.e. the binned average) were then computed. Subsequently, the Spearman’s rank correlation coefficient ρ between the binned average weights and binned average attribute values was computed as a measure of the correlation strength between the edge weight and an edge attribute.

3.3.3. Impacts on edge formation

The influence of the edge attributes on edge formation was tested using random simulations. In each simulation, an equal number of edges were placed between pairs of randomly selected nodes (with replacement) with a uniform probability from the set of nodes present in the network to construct a randomized network. That is, edge formation is equally likely between any pair of nodes in a randomized network. Network statistics (e.g. distribution node degree and three edge attributes) for the randomized network were then obtained. This process was repeated 30 times to derive the summary statistics of the network statistics on the randomized networks. These were then compared to the network statistics of the iNaturalist network. Significant differences between the original network statistics and those obtained on randomized networks would indicate that edge attributes have statistically significant influences on edge formation in the iNaturalist network.

3.3.4. Impacts on community formation

Many social networks display a community or cluster structure, and community detection can partition the network into disjoint sets of nodes, such that nodes within a set are much more closely connected than nodes between any two sets are (Bedi and Sharma Citation2016). The state-of-the-art ensemble-based community detection algorithm (Poulin and Théberge Citation2019) implemented in cuGraph was adopted to detect disjoint communities in an iNaturalist network.

The detected communities were then profiled to determine whether geographic proximity, species taxon composition similarity, and land cover composition similarity influenced community formation. The distributions of the edge attributes within a community were compared against the distributions on the larger network to determine whether connected (interacting) nodes within a community tended to have shorter geographic distances, higher species taxon composition similarity, or higher land cover composition similarity than nodes interacting in the larger network. This allows further verification of whether factors represented by edge attributes have any influence on driving social interactions in the iNaturalist network.

4. Results

4.1. Structural characteristics

The resulted social network has 49,359 edges and 18,548 nodes, among which 14,000 (75.5%) are ‘pure observers,’ 1993 (10.7%) are ‘pure identifiers,’ and 2555 (13.8%) are ‘mixed observer-identifiers’ (). The network was sparsely connected (density = 0.000143, transitivity = 0.073, and average clustering coefficient = 0.0698). Node degrees appear to follow a ‘small-world’ network model (McCulloh, Armstrong, and Johnson Citation2013). Half of the nodes were connected to only one other node (, left). Considering edge weight, half of the nodes were involved in species identification interactions, with up to 13 identifications (, right). There are 264 connected components in the network. The largest component covered 17,935 nodes (96.7% of all nodes) and 49,006 edges (99.3% of all edges). No other component had more than four nodes. Subsequent results were based on analyses of the largest component network (hereinafter referred to as the iNaturalist network).

Figure 2. Composition of iNaturalist contributors in the iNaturalist network.

Figure 2. Composition of iNaturalist contributors in the iNaturalist network.

Figure 3. Frequency distribution of node degrees in the iNaturalist network.

Figure 3. Frequency distribution of node degrees in the iNaturalist network.

An iNaturalist is a global-scale social network that spreads across the world (). Approximately 80% of the contributors are located in the Americas (especially in North America) and Europe. More than 35% were in the United States, with California and Texas being the two states with the largest number of contributors (). This type of geographic bias in the distribution of contributors is common across many VGI projects (G. Zhang Citation2020). Plants, insects, and birds were the three most commonly observed taxon species. Urban and built-up land, savannas, grasslands, and woody savannas were the most frequently observed environments ().

Figure 4. Geovisualization of the iNaturalist social network (17,935 nodes and 49,006 edges). Each arc represents an edge connecting from an observer to an identifier.

Figure 4. Geovisualization of the iNaturalist social network (17,935 nodes and 49,006 edges). Each arc represents an edge connecting from an observer to an identifier.

Figure 5. Geographic distribution of iNaturalist contributors at different spatial levels.

Figure 5. Geographic distribution of iNaturalist contributors at different spatial levels.

Figure 6. Frequency distribution of iNaturalist species observations over species taxon categories and land cover types.

Figure 6. Frequency distribution of iNaturalist species observations over species taxon categories and land cover types.

4.2. Influences on edge strength

Edge weight (i.e. the number of species identifications between an observer and an identifier) and the three edge attributes (i.e. geographic distance, species taxon composition similarity, and land cover composition similarity between an observer and an identifier) all show skewed distributions (). Half of the edges (species identification interactions) were associated with weights of no greater than seven (i.e. no more than seven observations were identified in these interactions). Geographically, half of the interactions occurred between observers and identifiers no farther than 590 km apart. The median taxon composition similarity and land cover composition similarity are 0.86 and 0.73, respectively, indicating observer and identifiers involved in species identification interactions share a high level of similarity in preferences on species taxon categories and in observation environments.

Figure 7. Frequency distribution of edge weight and geographic distance, species taxon composition similarity, and land cover composition similarity.

Figure 7. Frequency distribution of edge weight and geographic distance, species taxon composition similarity, and land cover composition similarity.

There was a moderate to low negative correlation between edge weight and geographic distance (ρ = −0.321; p-value = 0.023), and strong positive correlations between edge weight and species taxon composition similarity (ρ = 0.84; p-value < 0.001) and land cover composition similarity (ρ = 0.876; p-value < 0.001) (). All correlations were statistically significant (i.e. p-value < 0.05). This revealed a general tendency that inter-contributor species identification interactions are more likely to occur between observers and identifiers who are in closer geographic proximity and prefer to observe species in similar taxon categories and in observation environments of similar land cover types.

Figure 8. Correlation between edge weight and geographic distance, species taxon composition similarity, and land cover composition similarity.

Figure 8. Correlation between edge weight and geographic distance, species taxon composition similarity, and land cover composition similarity.

4.3. Influences on edge formation

Edge formation in the iNaturalist network displays drastically different patterns compared with randomized networks, wherein edges are equally likely to form between any two nodes (). An iNaturalist network tends to have a much larger number of nodes with smaller degrees than a random network, indicating that edge formation (i.e. interaction occurrence) among iNaturalist contributors is highly selective.

Figure 9. Distribution of node degree, edge geographic distance, species taxon composition similarity, and land cover composition similarity in the iNaturalist network and randomized networks.

Figure 9. Distribution of node degree, edge geographic distance, species taxon composition similarity, and land cover composition similarity in the iNaturalist network and randomized networks.

Nodes connected by edges in the iNaturalist network are much closer to each other than randomly connected nodes (), suggesting that geographically proximate iNaturalist contributors are more likely to develop species identification interactions. Furthermore, connected node pairs in the iNaturalist network also possess much higher species taxon composition similarity and land cover composition similarity than what would be expected from a random network, implying that species identification interactions are more likely to occur between observers and identifiers who have a shared interest in species in certain taxon categories and shared preferences on the environment (e.g. land cover type) in which to conduct species observations. Geographic proximity, species taxon composition similarity, and land cover composition similarity all have significant influences on the occurrence of species identification interactions in iNaturalist.

4.4. Influences on community formation

A total of 26 communities (i.e. interaction clusters) with more than 100 members (nodes) were detected in the iNaturalist network. Among the communities, 21 (80.7%) had a shorter median edge geographic distance than the full iNaturalist network, 18 (69.2%) had a higher median species taxon composition similarity, and 16 (61.5%) had a higher median land cover composition similarity ().

Figure 10. Violin plots comparing the distributions of geographic distance, species taxon composition similarity, and land cover composition similarity in the detected communities against those in the iNaturalist network. Communities in blue have smaller median distance or higher median similarity compared to the iNaturalist network; Communities in yellow have larger median distance or lower median similarity.

Figure 10. Violin plots comparing the distributions of geographic distance, species taxon composition similarity, and land cover composition similarity in the detected communities against those in the iNaturalist network. Communities in blue have smaller median distance or higher median similarity compared to the iNaturalist network; Communities in yellow have larger median distance or lower median similarity.

As examples, the three largest communities have distinct characteristics with respect to within-community geographic distance (), taxon composition, and land cover similarity (). Members in the first community cluster in Europe and appear to be more interested (than iNaturalist contributors on average) in observing insects in croplands, mixed forests, and savannas environments. The second community is also most interested in insects, but its members are mostly within the United States (Texas, in particular), making observations in deciduous broadleaf forests, grasslands, wood savannas, and urban and built-up lands. The third community is also situated in the United States (especially California), but its members tend to be birders watching birds in a variety of habitats (e.g. open shrublands, grasslands, water bodies, and urban and built-up lands).

Figure 11. Geographic distribution of members in the three largest communities.

Figure 11. Geographic distribution of members in the three largest communities.

Figure 12. Difference in relative frequency distributions on species taxon categories and land cover types between the three largest communities and the iNaturalist network.

Figure 12. Difference in relative frequency distributions on species taxon categories and land cover types between the three largest communities and the iNaturalist network.

The above patterns and trends indicate that geographic proximity, species taxon composition similarity, and land cover composition similarity influence the formation of communities in the iNaturalist Social Network. Overall, clusters of immense species identification interactions are more likely to form among contributors who are close to each other geographically and share interests in very similar species taxon categories and in very similar observation environments.

5. Discussion

5.1. Drivers of social interactions in VGI communities

The results showed that inter-contributor geographic distance, species taxon composition similarity, and land cover composition similarity had significant influences on the establishment of interactions, the strength of interactions, and the formation of high-interaction clusters in the iNaturalist social network. Additional analyses were conducted on intra-contributor taxon composition and land cover composition similarities. For each ‘mixed observer-identifier’ contributor in the iNaturalist network, the intra-contributor similarity is computed between the composition of observations made by the contributor (as observer) and the composition of observations identified by the contributor (as identifier). Interestingly, intra-contributor species taxon composition similarity and land cover composition similarity were higher than their inter-contributor counterparts. The median intra-contributor taxon composition similarity and land cover composition similarity are 0.95 and 0.82, respectively (), whilst the corresponding median inter-contributor similarity is 0.86 and 0.73, respectively (). This suggests that ‘mixed observer-identifiers’ prefer to conduct identifications on observations in taxon categories and observation environments that are very similar to their own observations. All of this evidence indicates that geographic proximity, collective interest in certain species taxon categories, and shared preferences on conducting species observations on certain types of land drive inter-contributor species identification interactions in iNaturalist.

Figure 13. Relative frequency distribution of intra-contributor species taxon composition and land cover composition similarity.

Figure 13. Relative frequency distribution of intra-contributor species taxon composition and land cover composition similarity.

In relation to the theoretical framework on the key social forces driving social interactions (McCulloh, Armstrong, and Johnson Citation2013), geographic distance as a social interaction driver in iNaturalist represents physical proximity, and species taxon composition similarity and land cover composition similarity reflect homophily. Prestige may be another social interaction driving force among iNaturalist contributors. PageRank analysis (Hagberg, Swart, and S. Chult Citation2008) of nodes in the iNaturalist network reveals that the nodes associated with the highest page rank scores tend to be expert identifiers who maintain many connections to other community members by identifying a large number of their species observations. Reciprocity and transitivity do not seem to be major forces of social interactions in the iNaturalist network, as the metrics reflecting them are at a very low level (transitivity = 0.073 and overall reciprocity = 0.016).

The findings of this study are expected to be generalizable to VGI communities similar to iNaturalist wherever volunteers conduct active observations on certain targets in a geographic context and interact with other volunteers to co-create VGI. In such cases, geographic closeness (proximity), collective interest on observation targets, and shared preferences on observation environments (homophily) are likely to be major drivers of the social interactions among volunteers.

5.2. VGI communities as spatial social networks

Many VGI social networks, including iNaturalist, are essentially spatially embedded social networks (or simply spatial social networks) as the spatial location of nodes and geographic proximity by extension play explicit or implicit roles in shaping network topology (Expert et al. Citation2011; Ye and Andris Citation2021). The discovery of geographic proximity being a major driver of social interactions in iNaturalist aligns well with existing studies reporting ‘distance-decay’ or ‘regionalization’ effects in the formation of spatial social networks and/or spatially constrained communities therein (Chen, Xu, and Xu Citation2015; Expert et al. Citation2011; Guo et al. Citation2018; Y. Xu, Santi, and Ratti Citation2022). Similarly, the ‘homophily’ effect is also common in spatial social networks. For example, Xu, Santi, and Ratti (Citation2022) found that after controlling for distance, places similar in socioeconomic characteristics tend to have relatively higher communication intensity (measured through cellphone communication) than dissimilar ones.

5.3. Theoretical contribution and practical implications

Social networks are the underpinning of many collaborative VGI projects. This study contributes to advancing VGI research by shedding light on the patterns and drivers of social interactions among VGI contributors based upon social theories on the driving forces behind social interactions (McCulloh, Armstrong, and Johnson Citation2013). This study in turn also enriches the social theories by not only offering supporting evidence from a spatial social network case, but also developing novel metrics to measure the forces in a geographic context (e.g. the physical proximity force was measured as geographic distance and homophily quantified as similarities in land cover composition and in species taxon composition). These geography-based metrics augment means for measuring the social forces and examine their effects in spatial social networks. Additionally, this study provides evidence to support the observation that human social interactions in both cyberspace and the real world both reflect the geography of communication (e.g. social, economic, cultural, and linguistic constraints) and therefore could present similar patterns (e.g. the distance-decay effect) (Gong and Ye Citation2021).

The results from this study also have practical implications for VGI platforms and communities to enhance VGI data quality and to attract and retain contributors for long-term sustainability. Taking iNaturalist for instance, when resolving conflicting species identifications for a species observation, the platform could put more trust on the identifications proposed by contributors who have aligned taxonomic expertise (judged based on previous contributions) and who were active in the same local area (and thus are likely to know the species better). The iNaturalist platform recruits new participants or boost the participation of existing contributors by organizing ‘bioblitzes,’ which are recording events aiming to find and identify as many species as possible in a selected area over a short time window (Martínez-Sagarra, Castilla, and Pando Citation2022). These ‘bioblitzes’ can also be organized by species taxon and/or by habitat type (e.g, mountain birds bioblitzes in the Rocky Mountain National Park) to engage participants who are interested in specific species and/or habitats.

5.4. Limitations

Findings from the iNaturalist example of this study is not expected to be generalizable to all types of VGI communities given that the VGI phenomenon itself is highly diverse and heterogeneous (G. Zhang Citation2021). Nevertheless, iNaturalist is expected to well represent collaborative VGI projects wherein social interaction is a key underpinning. For example, inter-contributor co-editing activity in OSM (Sarkar and Anderson Citation2022) is a form of social interaction similar to species identification interactions in iNaturalist. Therefore, it is reasonable to speculate OSM co-editing interaction patterns and their drivers are similar to iNaturalist.

One caveat, though, is that the geographic bias in the distribution of iNaturalist contributors ( and ) might have an impact on the generalizability of the findings from iNaturalist to other VGI communities. iNaturalist contributors are concentrated in North America and Europe and they are of certain demographic, socioeconomic, and cultural characteristics. If another collaborative VGI community’s contributors are from populations in other regions of the world that are very different than iNaturalist, the social forces driving interactions may act differently due to potential demographic, socioeconomic, or cultural differences.

6. Conclusion

Existing studies on VGI rarely examine social interactions among VGI contributors, which are key elements of VGI. This study fills this gap by conducting analyses of the patterns of social interactions in iNaturalist communities, as an example of VGI communities, and exploring potential factors that may have shaped the interaction patterns. The interactions were modeled as edges in a social network of iNaturalist contributors, which were then examined to elucidate network characteristics and extract the social interaction patterns therein. The relationships between edge attributes representing potential drivers of social interactions in iNaturalist (i.e. geographic distance, species taxon composition similarity, and land cover composition similarity), edge formation (interaction establishment), edge strength (interaction intensity), and community formation (interaction clustering) were explored through correlation analysis, random simulations, and community detection, respectively. The analysis results consistently show that geographic proximity between iNaturalist contributors and their common interests in species taxon categories and shared preferences on observation environments (i.e. land cover types) appear to be the major drivers of inter-contributor species identification interactions in iNaturalist.

The findings of this study are supported by the theoretical framework on the social forces that drive the formation of social links in a social network. Specifically, geographic distance represents physical proximity whilst species taxon composition similarity and land cover similarity reflect homophily effects (McCulloh, Armstrong, and Johnson Citation2013). Moreover, iNaturalists and many other VGI communities are spatial social networks. The discovered social interaction drivers in iNaturalist (i.e. geographic proximity and homophily effects) generally align with those in spatial social networks and are expected to be generalizable to collaborative VGI communities beyond iNaturalist.

The iNaturalist citizen science is a VGI project with global coverage by nature, and inter-contributor species identification interactions happen between contributors located at places across the world. This study examines social interactions (inter-contributor species identification activities) and their drivers within the whole iNaturalist community and to its full geographic extent. Thus, in this study, we did not conduct analyses using partial data from smaller geographic areas. Future work could carry out similar analyses in selected regions of interest (e.g. United States, Germany, California, Texas, etc.) to investigate whether, how, and why social interaction patterns and drivers in region-specific VGI contributors (e.g. iNaturalist or OSM) may differ among geographic regions and/or from those uncovered in this global-scope study.

Acknowledgement

The authors appreciate the numerous participants in the iNaturalist biodiversity citizen science project for their generous data contribution efforts, which made this research possible. This study was supported by the University of Denver through a Professional Research Opportunities for Faculty (PROF) grant. The APC was sponsored by the University of Denver’s Open Access Publication Equity Fund.

Data availability statement

The iNaturalist raw observations used in this study were downloaded from the iNaturalist website at https://www.inaturalist.org/observations/export. iNaturalist ‘research-grade’ observations were downloaded from the Global Biodiversity Information Facility website (https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The reported work did not receive external funding.

References

  • Bedi, P., and C. Sharma. 2016. “Community Detection in Social Networks.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6 (3): 115–135. https://doi.org/10.1002/widm.1178.
  • Bégin, D., R. Devillers, and Stéphane Roche. 2018. “The Life Cycle of Contributors in Collaborative Online Communities – The Case of OpenStreetMap.” International Journal of Geographical Information Science 32 (8): 1611–1630. https://doi.org/10.1080/13658816.2018.1458312.
  • Brandes, U., and T. Erlebach. 2005. Network Analysis: Methodological Foundations. Lecture Notes in Computer Science. Vol. 3418. http://www.amazon.com/dp/3540249796.
  • Chen, Y., J. Xu, and M. Xu. 2015. “Finding Community Structure in Spatially Constrained Complex Networks.” International Journal of Geographical Information Science 29 (6): 889–911. https://doi.org/10.1080/13658816.2014.999244.
  • Cheong, F., and C. Cheong. 2011. “Social Media Data Mining: A Social Network Analysis of Tweets during the Australian 2010–2011 Australian Floods.” In PACIS 2011-15th Pacific Asia Conference on Information Systems: Quality Research in Pacific, 1–16. Queensland University of Technology.
  • Connors, J. P., S. Lei, and Maggi Kelly. 2012. “Citizen Science in the Age of Neogeography: Utilizing Volunteered Geographic Information for Environmental Monitoring.” Annals of the Association of American Geographers 102 (6): 1267–1289. https://doi.org/10.1080/00045608.2011.627058.
  • Elwood, S., M. F. Goodchild, and Daniel Z Sui. 2012. “Researching Volunteered Geographic Information: Spatial Data, Geographic Research, and New Social Practice.” Annals of the Association of American Geographers 102 (3): 571–590. https://doi.org/10.1080/00045608.2011.595657.
  • Expert, P., T. S. Evans, V. D. Blondel, and R. Lambiotte. 2011. “Uncovering Space-independent Communities in Spatial Networks.” Proceedings of the National Academy of Sciences of the United States of America 108 (19): 7663–7668. https://doi.org/10.1073/pnas.1018962108/-/DCSupplemental.
  • Fender, A., B. Rees, and J. Eaton. 2022. “RAPIDS CuGraph.” In Massive Graph Analytics, edited by David A. Bader, 483–493. New York, NY, USA: Chapman and Hall/CRC.
  • Feng, Y., X. Huang, and M. Sester. 2022. “Extraction and Analysis of Natural Disaster-related VGI from Social Media: Review, Opportunities and Challenges.” International Journal of Geographical Information Science 36 (7): 1275–1316. https://doi.org/10.1080/13658816.2022.2048835.
  • Gong, X., S. Peng, Y. Lu, S. Wang, X. Huang, and X. Ye. 2022. “Social Network Analysis of Nonprofits in Disaster Response: The Case of Twitter during the COVID-19 Pandemic in the United States.” Social Science Computer Review, 1–26. https://doi.org/10.1177/08944393221130674.
  • Gong, X., and X. Ye. 2021. “Governors Fighting Crisis: Responses to the COVID-19 Pandemic across U.S. States on Twitter.” Professional Geographer 73 (4): 683–701. https://doi.org/10.1080/00330124.2021.1895850.
  • Goodchild, M. F. 2007. “Citizens as Sensors: The World of Volunteered Geography.” GeoJournal 69 (4): 211–221. https://doi.org/10.1007/s10708-007-9111-y.
  • Goodchild, M. F., and L. Li. 2012. “Assuring the Quality of Volunteered Geographic Information.” Spatial Statistics 1 (May): 110–120. https://doi.org/10.1016/j.spasta.2012.03.002.
  • Guo, D., H. Jin, P. Gao, and Xi Zhu. 2018. “Detecting Spatial Community Structure in Movements.” International Journal of Geographical Information Science 32 (7): 1326–1347. https://doi.org/10.1080/13658816.2018.1434889.
  • Hagberg, A., P. Swart, and D. S. Chult. 2008. “Exploring Network Structure, Dynamics, and Function Using NetworkX.” In Proceedings of the 7th Python in Science Conference (SciPy 2008), edited by G. Varoquaux, T. Vaught, and J. Millman, 11–15. Pasadena, CA, USA, Los Alamos, NM, US: Los Alamos National Lab (LANL).
  • Haklay, M. 2016. “Why is Participation Inequality Important?” In European Handbook of Crowdsourced Geographic Information, edited by Cristina Capineri, Muki Haklay, Haosheng Huang, Vyron Antoniou, Juhani Kettunen, Frank Ostermann, and Ross Purves, 35–44. London, UK: Ubiquity Press. https://doi.org/10.5334/bax.
  • Haklay, M. 2021. “Geographic Citizen Science: An Overview.” In Geographic Citizen Science Design, edited by Artemis Skarlatidou and Muki Haklay, 15–37. London: UCL Press.
  • Haklay, M., S. Basiouka, V. Antoniou, and A. Ather. 2010. “How Many Volunteers Does it Take to Map an Area Well? The Validity of Linus’ Law to Volunteered Geographic Information.” The Cartographic Journal 47 (4): 315–322. https://doi.org/10.1179/000870410X12911304958827.
  • Haklay, M., and P. Weber. 2008. “OpenStreetMap: User-generated Street Maps.” IEEE Pervasive Computing 7 (4): 12–18. https://doi.org/10.1109/MPRV.2008.80.
  • Han, S. Y., M. H. Tsou, and K. C. Clarke. 2018. “Revisiting the Death of Geography in the Era of Big Data: The Friction of Distance in Cyberspace and Real Space.” International Journal of Digital Earth 11 (5): 451–469. https://doi.org/10.1080/17538947.2017.1330366.
  • Haworth, B., and E. Bruce. 2015. “A Review of Volunteered Geographic Information for Disaster Management.” Geography Compass 9 (5): 237–250. https://doi.org/10.1111/gec3.12213.
  • iNaturalist. 2022a. “Frequently Asked Questions.” 2022. https://www.inaturalist.org/pages/help#geoprivacy.
  • iNaturalist. 2022b. “iNaturalist Observations.” 2022. https://www.inaturalist.org/observations.
  • iNaturalist. 2022c. “iNaturalist Observations Export.” 2022. https://www.inaturalist.org/observations/export.
  • iNaturalist. 2023. “Frequently Asked Questions.” 2023. https://www.inaturalist.org/pages/help#quality.
  • Kelling, S., C. Lagoze, W.-K. Wong, J. Yu, Theodoros Damoulas, Jeff Gerbracht, Daniel Fink, and Carla Gomes. 2013. “eBird: A Human/Computer Learning Network to Improve Biodiversity Conservation and Research.” AI Magazine 34 (1): 10–20. https://doi.org/10.1609/aimag.v34i1.2431.
  • Kim, J., and M. Hastak. 2018. “Social Network Analysis: Characteristics of Online Social Networks after a Disaster.” International Journal of Information Management 38 (1): 86–96. https://doi.org/10.1016/j.ijinfomgt.2017.08.003.
  • Landwehr, P. M., and K. M. Carley. 2014. Social Media in Disaster Relief: Usage Patterns, Data Mining Tools, and Current Research Directions, 225–257. Springer Berlin, Heidelberg: Springer.
  • Li, J., K. K. Stephens, Y. Zhu, and D. Murthy. 2019. “Using Social Media to Call for Help in Hurricane Harvey: Bonding Emotion, Culture, and Community Relationships.” International Journal of Disaster Risk Reduction 38:101212. https://doi.org/10.1016/j.ijdrr.2019.101212.
  • Liberatore, A., E. Bowkett, C. J. MacLeod, E. Spurr, and N. Longnecker. 2018. “Social Media as a Platform for a Citizen Science Community of Practice.” Citizen Science: Theory and Practice 3 (1): 3. https://doi.org/10.5334/cstp.108.
  • Liu, Y., X. Liu, S. Gao, L. Gong, C. Kang, Y. Zhi, G. Chi, and L. Shi. 2015. “Social Sensing: A New Approach to Understanding Our Socioeconomic Environments.” Annals of the Association of American Geographers 105 (3): 512–530. https://doi.org/10.1080/00045608.2015.1018773.
  • Malik, M. M., H. Lamba, C. N, and J. Pfeffer. 2015. “Population Bias in Geotagged Tweets.” In Nineth International AAAI Conference on Web and Social Media, May 26–29, 18–27. Oxford, UK.
  • Martínez-Sagarra, G., F. Castilla, and F. Pando. 2022. “Seven Hundred Projects in INaturalist Spain: Performance and Lessons Learned.” Sustainability 14 (17): 1–15. https://doi.org/10.3390/su141711093.
  • McCulloh, I., H. Armstrong, and A. Johnson. 2013. Social Network Analysis with Applications. Hoboken, New Jersey, US: John Wiley & Sons.
  • McGough, A., H. Kavak, and R. Mahabir. 2022. “Revisiting Linus’ Law in OpenStreetMap: An Agent-based Approach.” In Social, Cultural, and Behavioral Modeling, edited by Robert Thomson, Christopher Dancy, and Aryn Pyke, 123–133. Cham: Springer International Publishing.
  • Mooney, P., and P. Corcoran. 2014. “Analysis of Interaction and Co-editing Patterns amongst OpenStreetMap Contributors.” Transactions in GIS 18 (5): 633–659. https://doi.org/10.1111/tgis.12051.
  • PostGIS. 2023. “ST_GeometricMedian.” PostGIS Documentation. 2023. https://postgis.net/docs/ST_GeometricMedian.html.
  • Poulin, V., and F. Théberge. 2019. “Ensemble Clustering for Graphs: Comparisons and Applications.” Applied Network Science 4 (1). https://doi.org/10.1007/s41109-019-0162-z.
  • Sarkar, D., and J. T. Anderson. 2022. “Corporate Editors in OpenStreetMap: Investigating Co-editing Patterns.” Transactions in GIS, 1–19. https://doi.org/10.1111/tgis.12910.
  • Sbrocchi, C., G. Pecl, I. Putten, and Philip Roetman. 2022. “A Citizen Science Community of Practice: Relational Patterns Contributing to Shared Practice.” Citizen Science: Theory and Practice 7 (1): 1–14. https://doi.org/10.5334/CSTP.358.
  • Stephens, M., and A. Poorthuis. 2015. “Follow Thy Neighbor: Connecting the Social and the Spatial Networks on Twitter.” Computers, Environment and Urban Systems 53:87–95. https://doi.org/10.1016/j.compenvurbsys.2014.07.002.
  • Sulla-Menashe, D., and M. A. Friedl. 2018. User Guide to Collection 6 MODIS Land Cover (MCD12Q1 and MCD12C1) Product. Sioux Falls, SD, USA. https://lpdaac.usgs.gov/documents/101/MCD12_User_Guide_V6.pdf.
  • Sullivan, B. L., C. L. Wood, M. J. Iliff, R. E. Bonney, D. Fink, and S. Kelling. 2009. “eBird: A Citizen-based Bird Observation Network in the Biological Sciences.” Biological Conservation 142 (10): 2282–2292. https://doi.org/10.1016/j.biocon.2009.05.006.
  • Torres, A. C., B. Bedessem, N. Deguines, and C. Fontaine. 2022. “Online Data Sharing with Virtual Social Interactions Favor Scientific and Educational Successes in a Biodiversity Citizen Science Project.” Journal of Responsible Innovation, 1–19. https://doi.org/10.1080/23299460.2021.2019970.
  • Tupikina, L., F. Schlosser, V. Voskresenskii, K. Kloppenborg, F. Lopez, A. Mariz, A. Mogilevskaja, M. Haklay, and B. G. Tzovaras. 2021. “iNaturalist Citizen Science Community during City Nature Challenge: New Computational Approach for Analysis of User Activity.” ArXiv Preprint ArXiv:2112.02693.
  • Ueda, K. 2022. “iNaturalist Research-Grade Observations. iNaturalist.org. Occurrence Dataset.” GBIF.org. https://doi.org/10.15468/ab3s5x.
  • Unger, S., M. Rollins, A. Tietz, and H. Dumais. 2021. “iNaturalist as an Engaging Tool for Identifying Organisms in Outdoor Activities.” Journal of Biological Education 55 (5): 537–547. https://doi.org/10.1080/00219266.2020.1739114.
  • Wang, Z., and X. Ye. 2018. “Social Media Analytics for Natural Disaster Management.” International Journal of Geographical Information Science 32 (1): 49–72. https://doi.org/10.1080/13658816.2017.1367003.
  • Xu, J., and Y. Qiang. 2022. “Analysing Information Diffusion in Natural Hazards Using Retweets – A Case Study of 2018 Winter Storm Diego.” Annals of GIS 28 (2): 213–227. https://doi.org/10.1080/19475683.2021.1954086.
  • Xu, Y., P. Santi, and C. Ratti. 2022. “Beyond Distance Decay: Discover Homophily in Spatially Embedded Social Networks.” Annals of the American Association of Geographers 112 (2): 505–521.
  • Yan, Y., C. Feng, W. Huang, H. Fan, and Y. Wang. 2020. “Volunteered Geographic Information Research in the First Decade: A Narrative Review of Selected Journal Articles in GIScience.” International Journal of Geographical Information Science 34 (9): 1765–1791. https://doi.org/10.1080/13658816.2020.1730848.
  • Ye, X., and C. Andris. 2021. “Spatial Social Networks in Geographic Information Science.” International Journal of Geographical Information Science, 1–5. https://doi.org/10.1080/13658816.2021.2001722.
  • Zhang, G. 2020. “Spatial and Temporal Patterns in Volunteer Data Contribution Activities: A Case Study of EBird.” ISPRS International Journal of Geo-information 9 (10): 597. https://doi.org/10.3390/ijgi9100597.
  • Zhang, G. 2021. “Volunteered Geographic Information.” In Geographic Information Science & Technology Body of Knowledge, edited by John P. Wilson. https://doi.org/10.22224/gistbok/2021.1.1.
  • Zhang, D., Y. Ge, A. Stein, and W. B. Zhang. 2021. “Ranking of VGI Contributor Reputation Using an Evaluation-based Weighted Pagerank.” Transactions in GIS 25 (3): 1439–1459. https://doi.org/10.1111/tgis.12735.
  • Zhang, G., and A. X. Zhu. 2018. “The Representativeness and Spatial Bias of Volunteered Geographic Information: A Review.” Annals of GIS 24 (3): 151–162. https://doi.org/10.1080/19475683.2018.1501607.
  • Zhang, G., A. X. Zhu, Z. P. Huang, G. Ren, C. Z. Qin, and W. Xiao. 2018. “Validity of Historical Volunteered Geographic Information: Evaluating Citizen Data for Mapping Historical Geographic Phenomena.” Transactions in GIS 22 (1): 149–164. https://doi.org/10.1111/tgis.12300.
  • Zhu, D., F. Zhang, S. Wang, Y. Wang, X. Cheng, Z. Huang, and Y. Liu. 2020. “Understanding Place Characteristics in Geographic Contexts through Graph Convolutional Neural Networks.” Annals of the American Association of Geographers 110 (2): 408–420. https://doi.org/10.1080/24694452.2019.1694403.