455
Views
0
CrossRef citations to date
0
Altmetric
EU Cohesion Policy towards Territorial Cohesion?

Computational social science in regional analysis and the European real estate market

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Received 02 Aug 2022, Accepted 06 Mar 2024, Published online: 18 Apr 2024

ABSTRACT

The recent so-called ‘data revolution’ offers unprecedented opportunities to innovate regional policies. New data sources are being widely used by the scientific community, however their uptake is far from being systematic in the policy cycle, where data innovation can improve territorial impact assessment. This paper presents a survey on the use of non-traditional data in the context of regional policy, together with a case study on real estate markets of three European countries, highlighting the perspectives and limitations of computational social science in regional analysis in terms of data quality and availability.

1. INTRODUCTION

One of the most distinctive phenomena of the first two decades of the third millennium has been the so-called ‘data revolution’. Since the early 2000s, we have witnessed increasing rates of availability of cheap computational power and cheap storage space, as well as widespread ownership and usage of connected and ‘smart’ devices (from phones to electric appliances, from electricity meters to connected cars), leading to an unprecedented generation of data.

These three conditions – data generation, processing and increasing storage capacities – have created a real ‘datafication’ of many human activities, ultimately stimulating advances in several scientific disciplines. While the impact of the data revolution has certainly been disruptive for the ‘hard’ sciences, the widespread availability of reliable data sources on human behaviour and phenomena has triggered a still ongoing Copernican revolution in the social sciences (Einav & Levin, Citation2013).

With the growing amount of digital trace data available, a new discipline called computational social science (CSS) has emerged (Lazer et al., Citation2009, Citation2020). CSS aims to apply advanced computational methods (either simulative or inferential) to data about the behaviour and socioeconomic interactions of human beings. These data can vary in complexity, scale and granularity.

It was only a matter of time before the wealth of technical and interpretative capabilities offered by CSS was harnessed by public decision-makers to inform all phases of the policy cycle, from policy inception and drafting to ex-post evaluation. A research group of the Joint Research Centre (JRC) of the European Commission has specifically studied this subject and published a report aimed at understanding policy-makers’ needs for insights that CSS can provide (Bertoni et al., Citation2022b). This report served as the inspiration for a publication that provides a comprehensive overview of the knowledge landscape surrounding CSS for policy (Bertoni et al., Citation2022a).

One area where CSS is showing both scientific and policy impact is the territorial impact assessment (TIA) of policy measures (Medeiros, Citation2023). In his review, the authors provide a thorough examination of different CSS models and possibilities for integrating TIA. The potential of CSS for TIA is promising, as it offers a new way to assess the impact of policy measures on regions and communities, taking into account the complex dynamics of human behaviour and socioeconomic interactions. By CSS methods to TIA, policy-makers can gain a better understanding of the complex socioeconomic interactions that shape the impact of policies. This can be particularly relevant for one of the current European Commission’s priorities, the European Green Deal,Footnote1 where the impact of policies on different regions and communities can vary significantly depending on factors such as economic structures, social norms and environmental conditions. By leveraging CSS methods for TIA, policy-makers can design more effective policies that take into account the specific needs and characteristics of different regions and communities, and ensure that the transition to a more sustainable and low-carbon economy is fair and equitable for all.

As we will demonstrate in Section 2.1, TIA is just one of many fields of regional studies where CSS can offer valuable policy insights. Despite the clear benefits of using CSS methods to address policy issues, the road ahead is still fraught with challenges. One of the major challenges is the need for additional efforts in data integration, which we will explore in this paper through a case study that uses previously unexplored administrative data. This case study is presented in Section 3. We then discuss the methodological and applied findings in Section 4 and conclude by describing future directions for applied and methodological development in Section 5.

2. LITERATURE REVIEW

2.1. The use of non-traditional data and cutting-edge analysis techniques in regional policy

Over the past decade, a deluge of non-traditional data coming from new sources has started to appear and being used all across any field of science and practice. Researchers from different disciplines started validating spatial phenomena with data generated by smart cards used in public transports, social media platforms, mobile phones, GPS in phones and wearable devices, CCTV and Wi-Fi spots in public places.

These recently available data have different characteristics and advantages in comparison to traditional data employed in territorial and urban analysis and evaluation. First, they can either complement traditional data in the analysis or supply the process with newly spatially-rich information that was not available before. Second, these data often have a higher spatial and temporal resolution than traditional data employed in territorial analysis, are constituted of several different formats (averaged value per grid cell, point of interest, individual origin-destination records, geo-referenced localisation, etc.), and come from different sources and providers (cities’ open data repositories, private providers, web scraping, volunteered geographical information (VGI)). Then, such highly disaggregated data makes it possible to calculate metrics and indicators at a multiplicity of spatial levels, from the local and neighbourhood scale in cities to municipalities (LAUsFootnote2) and regions (NUTSFootnote3). Furthermore, from a temporal perspective, data from new sources are often more frequently updated than other traditional sources (e.g., censuses); this makes it possible to build comparable time-series, analyse the variations of territorial metrics, and evaluate the development and impact of policies and strategies over time and with different levels of aggregation. In addition, the presence of common attributes amongst different data sets makes it possible to join different information to enrich the data already available (i.e., traditional data) with spatial joins based on location/geographic boundaries.

The spatial information extracted from non-traditional data sources can be combined with traditional metrics and contribute to the understanding of pressing issues, including territorial disparities, depopulation and access to the housing or labour market, targeting specific areas with place-based policies and strategies. It is worth mentioning that sometimes, a lack of data and spatial information for certain areas is the signal for marginalised places and phenomenon of segregation (Robinson & Franklin, Citation2020).

Characteristics that differentiate non-traditional datasets from traditional ones encompass various dimensions. Non-traditional datasets offer frequent or real-time updates, providing timely information, while traditional ones tend to have periodic or less frequent refresh cycles, resulting in delayed data availability. Non-traditional datasets often feature a finer level of detail, allowing for precise analyses, in contrast to traditional datasets, that may present data in a coarser format. Moreover, non-traditional datasets frequently encompass significantly larger data volumes compared to traditional datasets. These innovative datasets originate from diverse sources like social media, sensor networks, or mobile devices, while traditional datasets usually stem from surveys or structured databases. In addition, non-traditional datasets are versatile, as they can include unstructured data types like text, images and videos alongside structured data, a common characteristic of traditional datasets. Non-traditional datasets are often more accessible, benefiting from open data initiatives or accessible APIs, whereas traditional ones may require special permissions or have restricted access. New datasets exhibit a high velocity, with data being generated rapidly, making them conducive to real-time or near-real-time analysis, which is essential for timely decision-making. In contrast, traditional datasets have a slower rate of data generation and may not be as suitable for real-time applications. These distinctive characteristics highlight the evolving data landscape, offering opportunities for more dynamic and insightful analyses.

Besides several advantages, it is important to consider that non-traditional data have bias and limitations, as with any other data, no matter how big or high resolution they are. Another point to notice is that many of these data recently available to researchers are not produced for the purpose of spatial analysis and territorial policy, therefore, although containing useful spatio-temporal information they might lack others or present missing records. Finally, it is worth mentioning that the impact of many big, real-time data from non-traditional sources still needs to be fully evaluated. Although new data has proved to be fundamental for territorial research in the past years, including ex-ante and ex-post analysis of certain phenomena and the correspondent policy definition and implementation, the impact of such high-frequency data on forecast, planning and policy for longer temporal horizons is yet to be understood (Kandt & Batty, Citation2021).

To reliably contribute to comparative analysis across different territories, data must be comparable in terms of spatial units, time frames and attributes. For non-traditional data, the same provider can supply data for different geographic areas (e.g., mobile network operators with mobile positioning data), so that the data are homogeneous in terms of spatial resolution, attributes, etc. In other cases, data are provided by different sources but the technologies recording the information (i.e., sensors) are the same, therefore data can be comparable and elaborated at the required spatial unit of analysis. However, there are many cases where data concerning the same phenomenon are not comparable across different territories because of significant differences in the data collection, elaboration, and/or release. For example, different levels of spatial aggregation can lead to difficulties in comparing several areas.

In the field of territorial analysis and assessment, several notable studies have harnessed non-traditional data sources alongside traditional spatial information. These investigations encompass the following areas, urban functions and land use classification, digital traces and human mobility analysis, and quality of life measurement in urban areas. In the area of urban functions and land use classification researchers have made strides in exploring urban functions and classifying land use based on the analysis of mobility patterns. They have harnessed various non-traditional data sources, such as smart card data (Zhong, Arisona et al., Citation2014; Zhong, Huang et al., Citation2014), and mobile phone data (Iacus et al., Citation2021; Pei et al., Citation2014). Additionally, social media data has played a pivotal role in the classification of urban functions (Frias-Martinez & Frias-Martinez, Citation2014). Furthermore, there have been endeavours to combine remote sensing data with conventional datasets to achieve a more comprehensive analysis (Samardzhiev et al., Citation2022; Singleton et al., Citation2022). In the digital traces and human mobility analysis area researchers investigations into human mobility across cities and countries have relied on the examination of digital traces originating from sources like social media and mobile phone data (Calabrese et al., Citation2010; González et al., Citation2008; Spyratos et al., Citation2019). Researchers have explored universal mobility patterns by leveraging non-traditional data sources (Alessandretti et al., Citation2020; Schläpfer et al., Citation2021). Moreover, the combination of traditional and non-traditional data sources has enabled assessments of spatial inequalities related to mobility and income (Moro et al., Citation2021). Finally in the quality of life measurement in urban areas, to gauge the quality of life in urban areas, studies have involved the analysis of images from social media and Google services, facilitating assessments of public spaces and walkability (Girardin et al., Citation2008; Miranda et al., Citation2021; Quercia et al., Citation2014). Furthermore, these investigations have delved into the detection of green areas, contributing to urban planning and environmental analysis (Li, Citation2021; Seiferling et al., Citation2017). Collectively, these studies emphasise the adaptability and potential of non-traditional data sources in enriching our comprehension of territorial dynamics and urban phenomena.

In this context, recent work developed by the JRC of the European Commission shows how tangibly the use of non-traditional data and innovative analysis techniques can contribute to inform policies at the local and European level. In particular, a recently-released Science for Policy report (Proietti et al., Citation2022) illustrates the impact of high resolution data and multi-scale analysis at the EU27 level to inform territorial policy also in relation to the latest European Commission’s priorities. Data sourced from non-traditional online sources and repositories provide quantitative geographical evidence that can inform place-based policies on tackling spatial inequalities such as the urban-rural digital divide and the accessibility of essential services at the regional and urban scale. Some of the data sources employed in this work were also used to inform the Cohesion Report 2022,Footnote4 and are highly relevant for some European Strategies such as the European Green DealFootnote5 and the new European Bauhaus.Footnote6

Another relevant example, in the context of regional policy, includes the analysis of mobility data provided to the JRC by mobile network operators in Europe to analyse the impact of the COVID-19 pandemic (Vespe et al., Citation2021). The analysis of mobile positioning data informed different policy-makers with evidence and data-driven knowledge to understand and predict the spread of the disease.

2.2. Literature review for real estate scenarios

Real estate data can be classified into three types: (i) financial data, (ii) transactional data and (iii) physical data. Financial data include information on real estate investment trusts’ (REITs) shares and real estate related stocks, while, transactional data refer to information on real estate purchases, mortgages, leases, etc. Finally, physical data describe the structural characteristics of the property and location data. With the development of GIS systems, data can be integrated with related neighbourhood information from the census, traffic sheds and street flow patterns, and analyses of proximity to amenities (De Nadai & Lepri, Citation2018).

In recent times, the emergence of location-based social networks (LBSNs) like X (formerly known as Twitter) has proven to be a promising avenue for capturing human activities. A location-based social network allows users to post short messages, and when a user writes a message can optionally decide to include the location, which is typically derived from the device’s GPS. Consequently, researchers have begun harnessing these non-traditional data sources to gain insights into the real estate market. For instance, in a paper by Taşcιlar and Arslanlι (Citation2022), the authors introduce a novel approach that utilises social media data to forecast commercial real estate trends during the COVID-19 pandemic. Their findings highlight the superior predictive performance of models supported by LBSN data compared to baseline models, particularly for price predictions and occasionally for rent predictions. This demonstrates the potential of LBSNs in enhancing our understanding of real estate dynamics.

The COVID-19 pandemic also accelerated the perception of digital transformation on real estate websites and digital practices facilitating communication and improving the performance of various sectors of the economy (Moro et al., Citation2022). Web searches, therefore, become an important source of information for measuring the intention to buy a house, and also help predict the city’s future housing price change (Beracha & Wintoki, Citation2013).

A relevant research topic concerns housing market expectation. In Kuchler et al. (Citation2023), authors presented a literature review on the determinants and effects of housing market expectations. Authors summarised how differences in housing market expectations translate into differences in individuals’ housing market behaviours, including their home purchasing and mortgage financing decisions.

Factors influencing the real estate market are diversified and complex. In particular, the effect of an emergency on the housing market is a very interesting topic and only very few specific studies can be found in literature. Emergencies affect the economy and consequently people’s lives, which is why studying the real estate market is an important task for policy-makers.

The real estate market experienced a severe crisis in 2007 due to the subprime bubble. The crisis generated a sharp depreciation of real estate values due to the insolvency of the owners. With regard to this issue, several studies have been carried out on the causes (Sornette & Woodard, Citation2010) and the consequences for national well-being (Milunovich & Trück, Citation2013).

Among the most important emergencies we can also mention health emergencies. The early 2000s were characterised by a major pandemic of severe acute respiratory syndrome (SARS) which remained confined to the eastern hemisphere. Some researchers have been able to observe the variation of prices in the Hong Kong area (Wong, Citation2008). While closer to the present day, in 2020, the COVID-19 pandemic changed the needs of people and companies at the global level (Balemi et al., Citation2021).

Pricing effects in housing markets follow different patterns, and for this reason the issues appear to be of great interest within the regional policy context. D’Lima et al. (Citation2022) found that post-shutdown pricing effects not only depend on population density but also on the size and structural density of properties. Prices will not tend to move independently from the context of the aforementioned economic variables, and for this reason, Del Giudice et al. (Citation2020) investigated the regional economic consequences of the COVID-19 pandemic, taking into account the ways in which the event might have affected the regional economic activity. In the realm of real estate market analysis, researchers employ various techniques to study market evolution. These techniques include the use of agent-based models (ABMs) (Ge, Citation2013) and the application of time-series analysis (Samadani & Costa, Citation2021). These approaches offer valuable insights into the dynamic nature of real estate markets, allowing researchers to explore different facets of their evolution.

Real estate is profoundly shaped by the influence of government policies and interventions within the housing market. Policies encompassing rent control, affordable housing initiatives, tax incentives, and housing voucher programmes exert direct impacts on property values, market dynamics and the behaviours of buyers, sellers and renters. Incorporating the context of these policy interventions into this review facilitates a more comprehensive examination of real estate data, enabling researchers to scrutinise both market-driven factors and the repercussions of governmental policies. Within the domain of policy interventions in the housing market, it is essential to highlight several noteworthy researches. The first study explores the repercussions of housing voucher programmes on labour market outcomes, delving into how housing assistance policies shape employment and earnings for recipients (Chyn et al., Citation2019). Another paper scrutinises the effects of inclusionary zoning policies on housing markets, probing the extent to which these policies, which mandate developers to incorporate affordable housing units, influence local housing markets and home prices (Schuetz et al., Citation2009). Moreover, a third study investigates the impact of rent control policies on property owners, conducting an analysis of how rent control influences housing supply, property maintenance and the financial well-being of property owners (Diamond et al., Citation2019). Lastly, a fourth study evaluates the influence of community gardens on nearby property values, shedding light on how community development initiatives, like urban gardening, can impact housing markets (Voicu & Been, Citation2008). Collectively, these studies furnish valuable insights into the intricate interplay between housing policies and the dynamics of real estate markets.

3. POTENTIAL OF NEWLY AVAILABLE DATA: AN EXPLORATORY CASE STUDY ON HOUSING TRANSACTIONS

To explore the soundness and feasibility of the usage of non-traditional data in regional analysis, we propose an exploratory case study to show how the analysis of such data can supply valuable information and contribute to tackle relevant policy questions, such as those listed in Bertoni et al. (Citation2022b, p. 189). The goal of this analysis is thus to highlight the potential of new quantitative evidence that can support the identification of targets and strategies for policy actions and implementation. We selected data about housing transactions as a case study. Depending on the richness of the data, their analysis can provide insights informing policy about residential patterns trends, housing demand and poverty across different territories, housing accessibility and shifts in preferences of the population in terms of where to live (as some preliminary work exploring the impact of COVID-19 on residential patterns showed). These types of quantitative and spatial information can help policy-makers fine-tune policies to address territorial issues and potential disparities across regions.

In this specific analysis, we have focused our attention on three European countries, namely Austria, France and Italy, and performed our analysis on three datasets, to illustrate the richness and sometimes the limitation of data in terms of comparative analysis and data quality, especially when results can inform integrated measures at the territorial level. We have chosen these specific countries because they make data available as open data (Italy and France) or they have responded to our extraction requests (Austria). The main characteristics of these datasets in terms of accessibility, timeliness and granularity are summarised in . However, the three datasets present very different characteristics in terms of spatial and temporal extent, completeness, metadata, etc. They also differ in terms of how much their format is close to that of traditional datasets. For example, Austrian data is supplied by the provider as already elaborated, with no individual records available to be used by external researchers. The Italian dataset also does not provide access to individual records, however it is only partially aggregated and elaborated and it is supplied as open data. Finally, the French dataset is closer to the format of many non-traditional data sources recently made available: it provides highly disaggregated data, with individual records and no predigested information, requiring an additional effort in terms of data cleaning and results interpretation. Nevertheless, the methods presented in this exploratory case study can be extended to other countries that will make their data available.

Table 1. Overview of data comparison according to the three dimensions of accessibility, timeliness and granularity.

3.1. Data description

3.1.1. Statistics Austria, real estate price statistics – special evaluation dataset for the European commission, 2022

The first dataset was provided by the Austrian National Statistical Institute (Statistics Austria).Footnote7 It contains the median price per square metre of transactions from 2015 to 2021 for 2073 municipalities existing in 2020 and 2021. Statistics Austria used the coordinates of the properties to geolocate each transaction in the correct municipality at that time. In the event of changes in the geographies of municipalities (due to mergers or divisions), Statistics Austria assigned the transactions backwards according to the last territorial division.

The transaction information, at municipal level and yearly granularity, is gathered from contracts by a private data provider, who delivers a dataset to Austria’s National Statistical Institute every month. The high frequency in the recording of such information represents an innovative aspect compared to traditional data, which are often produced at a higher aggregation scale (i.e., annually or more). However, the Austrian contracts do not contain many details, for example the size in terms of square metres is mostly missing for houses, thus Statistics Austria has to merge the transactions dataset with its building register and the cadastre to fill missing sizes and building age, likely introducing noise and errors in the data.Footnote8

It is worth mentioning that Statistics Austria filters out records with five or fewer transactions in order to reduce risks of re-identification. The privacy issue is one of the risks to consider in relation to non-traditional data, and it is also one of the most pressing issues perceived by the public users when data use and data sharing topics are mentioned. Another aspect to highlight is that this data set contains a number of very small municipalities (from 41 to 1741 inhabitants) in the country, and in several cases there may not be sufficient real estate transactions in the period of observation. This aspect highlights the fact that even highly disaggregated data, either at the spatial or temporal scale, can present completeness issues and missing records for different reasons. This fact needs to be acknowledged when results are employed to inform policies, including place-based territorial policies, in order not to overlook some areas or resident population whose records are missing from the data. In this work, to partially reduce the problem of missing data, we used the dataset that combines ‘Flats’ and ‘Houses’, to increase the probability that a small municipality has more than five transactions during the year.

3.1.2. Italian revenue agency, observatory on the real estate market

The real estate market observatory (OMI)Footnote9 set up at the Italian Revenue Agency (Agenzia delle Entrate) ensures the statistical control of the residential real estate market and makes the appropriate communications for the purposes of macro-prudential supervisory controls. The dataset we used in our analysis is released as open dataFootnote10 from 2016 to 2021 through authenticated access to the agency’s platform.

The database of real estate prices (BDQ OMI) provides a minimum–maximum range of market and rental values every six months for each territorial area by type of property, state of maintenance and conservation. The information is provided at a geographic level that is smaller than the municipal level (OMI areas). An OMI area reflects a homogeneous sector of the local real estate market, in which there is a substantial uniformity of economic, social and environmental conditions.Footnote11 The BDQ OMI is constructed by crossing the archives of the deeds of sale and census archives of the cadastre. The attribute at the junction between tables is the cadastral identification of the urban real estate unit. Therefore, differently from the Austrian dataset, this one provides information at a lower spatial and temporal aggregation scale, making it possible to identify territorial trends and patterns with more details, and therefore support territorial policies with more precise information.

The average market value is contained in the minimum–maximum range supplied by the data provider but is not extracted because, as stated in the Manual of the Quotes Database of the Real Estate Market Observatory:Footnote12 given the particular complexity and heterogeneity that characterises real estate, the determination of the relative average prices may be subject to representativeness limits.Footnote13 This note illustrates how, despite the effort to produce an homogeneous representation of the market, bias and outliers in the data introduce limitations in terms of representativeness of the results, and this aspect should be considered when using information extracted from any data in policy implementation. Our data processing is carried out at the municipal level (around 8000 municipalities): for each municipality, and only for residential buildings, we proceed to extract the minimum value among the minima of the OMI areas and the maximum value among the maxima. Doing so, we maintain the structure of the original data which provide the minimum and maximum price in euros per square metre for each area.

As mentioned earlier, this exploratory case study only focuses on sales of residential buildings (i.e., transactions of residential houses, economic houses, villas and cottages and stately homes). However, the original dataset is quite rich in terms of attributes related to the real estate market, and also contains transaction on commercial and industrial buildings. The complete list of the available categories is the following: residential houses, economic houses, industrial sheds, laboratories, warehouses, shops, offices, villas and cottages, typical sheds, shopping centres, stately homes, garages, covered parking spaces, typical houses of the places, structured offices, pensions and similar. This supplementary information is also extremely relevant for territorial policy dealing with land use distribution, zoning, urban sprawling, land and nature conservation, etc. Newly available datasets often show a richness in several attributes that can be exploited to benefit policy strategies and implementation at the territorial level.

3.1.3. General board of public finance (DGFiP), request for real estate values in France

For transparency reasons, French law states that tax authorities must provide, free of charge and available to the public, information related to sales, auctions, expropriations, and exchanges of real estate properties over the past five years. Therefore, unlike the other datasets that are pre-processed by national authorities, the dataset for France contains information of each individual transaction. To avoid privacy issues, the general conditions of use of information prescribes that, on the one hand, the processing related to the reuse of the information cannot have either the purpose or the effect of allowing the re-identification of persons concerned and, on the other hand, that this information cannot be subject to indexing on online search engines.

For each transaction, the dataset contains a record with date and nature of the transaction, price and address. The description of the municipality (e.g., surface, number of main rooms) and cadastral references (e.g., code of the municipality and department, prefix and code of the cadastral section, plan number of the location of the assets) are also present. The dataset is available for the last five years (second semester of 2016 to first semester 2021) on the open platform for French public data.Footnote14 For this analysis, the sale transactions (for apartments and houses) were aggregated and the median sale prices per square metre were calculated for 33,790 municipalities. Information on dépendances is also available but has not been used due to several inconsistencies and missing values.

Despite the advantage of providing highly disaggregated information, this data set also shows some limitations related to gaps and missing values that affect different attributes of the dataset. For example, for the housing typology dépendances that represents a significant portion of the data, the value for the attributes of ‘surface area’ and ‘number of rooms’ is zero. This represents a problem for the high number of records labelled as dépendances, especially in large cities, such as Paris. Moreover, considering our interest in residential typologies, the share of records not assigned to any of the housing typology label (apartement, dépendance, maison) in the dataset cannot be ignored. Furthermore, the distribution of missing information is not spatially even (for the city of Paris and its 20 arrondissements, this percentage is around 2.4%). We made several attempts to circumvent the issues represented by these gaps. In the case of France, we only included the typologies of apartement and maison in the analysis, also dropping records that show missing values for the characteristics of selling price and square metres. The case of French data shows that the lack of pre-processed and highly aggregated data, which represent an advantage for in-depth analysis, comes at the expense of noise in the data and a significant effort in the step of extracting explainable trends and information that can meaningfully inform policy needs.

3.2. Analysis and results

The data analysis illustrated in this section focuses on two main aspects: (1) identifying housing market trends, and (2) detecting similarities in the spatial patterns of market values across the three European country’s markets selected for this work. Results obtained from the analysis can be informative for answering policy questions and informing policy actions regarding the factors characterising the housing market’s current status. Among other results, the analysis can show how housing market trends differ between richer and poorer urban areas, between urban and rural areas, between inland and coastal areas, etc. These differences also tell more about the socioeconomic status of territories, highlighting spatial disparities and potential inequalities and missed opportunities for the residents of disadvantaged regions. Therefore, results can be used by policy-makers to identify areas and vulnerable communities to target across different territories, in order to contribute to a more cohesive and sustainable territorial development.

Before running a comparative analysis among the three different datasets, we performed a data cleaning process in order to identify issues such as missing values in the data. For Austria, due to pre-processed elaborations conducted by the data provider, no real estate valuations are available for over 700 municipalities, mainly rural ones. For Italy, minimum and maximum values are zero for less than 1% of all OMI areas analysed, and only one municipality has no valid values for all the years taken into account. For France, the original dataset is not pre-processed, therefore it required further effort in the cleansing step. We selected 5.79 million records about apartments and houses that included information on both the square metres sold and the value. Transactions have been aggregated at the municipal level and, similarly to the Austrian data (to avoid the risk of re-identification), municipalities with less than five transactions have been removed. This choice led to the discarding of approximately 5700 municipalities. This data cleansing process is fundamental for two main reasons: (1) to improve the results of the comparative analysis, so that data coming from different sources are standardised and aggregated at similar spatial and temporal scales (as much as possible); (2) to detect missing information in the data that could have an impact on results and therefore on the information supplied to policy-makers. This latter aspect is critical when policy actions and decisions are made based on direct quantitative results, as missing data could lead to the underestimation of some territorial aspects and trends for which no information is (yet) available in the collected data.

3.2.1. Identifying housing market trends

In the first part of the analysis, we use data visualisation to observe the evolution of prices over time in combination with some categorical variables. We compare the market values from the three different countries to identify common trends across territories and extract information about housing transactions that can support policy-making to address interventions that are targeted to the specific needs of each territory.

For this analysis, the first variable we use is the nomenclature of territorial units for statistics (NUTS). It is a geo-code standard for referencing the subdivisions of countries for statistical purposes,Footnote15 although the spatial extent is not homogeneous across countries. For our analysis, we use the NUTS1 level, which describes groups of regions and allows us to analyse the trend of prices within the national territory. The second variable we use is the degree of urbanisation (DEGURBA). DEGURBA classifies the entire territory of a country into three classes (1: cities/urban areas, 2: towns and semi-dense areas/intermediate areas, 3: rural areas), combining population size and population density thresholds to capture the full settlement hierarchy.Footnote16 This variable makes it possible to observe different housing market trends according to the different characteristics of urbanisation of very different areas, such as cities and rural villages, which also show very different trends in terms of depopulation, economic growth, etc. shows how many municipalities are classified as urban, intermediate, or rural areas for each macro-region. Finally, to improve comparison results, the sales value per square metre are scaled with respect to the first element of the series, namely 2015 for Austria and 2016 for Italy and France. In the following plots, the values on Y will be greater/less than one if larger/smaller than the first value of the series. If the value for the indicated years is not present, the series is re-scaled to the first valid value subsequently available. This transformation allows us to make areas with different degrees of urbanisation more comparable and to better compare different countries.

Table 2. Classification of municipalities within macro regions.

Results from this exploratory analysis highlight some essential differences across country regions and territories with different urbanisation patterns. Concerning Austria, shows that the trend in scaled prices has an increasing tendency throughout the territory and across categories starting from 2018. Starting from 2018 and accelerating from 2019, a marked increase in prices can be observed, especially in intermediate and rural areas, that can be related to second houses or people moving out of major big cities. About Italy, shows the trend of minimum and maximum prices by macro-region.Footnote17 Differently from Austria, Italy shows a general decreasing trend across all categories and regions with the notable exception of the northwest, where an increase of maximum prices for urban areas can be observed. The trend of prices in France by macro-regionFootnote18 is shown in . France has a much larger territory than the other two countries, therefore the number of geographical areas is higher. France shows a general increase in prices compared to the base year, with the same acceleration that we observe in Austria starting from years 2018 and 2019. Some areas appear as exceptions, such as Corsica and the Grand Est, where there was a slowdown in the last year. Compared to the other two countries, the rise in prices in French urban areas is not as noticeably different as, for example, that observed in Austria, where the intermediate and rural areas have registered significantly greater growth than the variation in urban areas.

Figure 1. Distribution of scaled prices per square metre over time and degree of urbanisation in Austria. East Austria contains Burgenland, Lower Austria, Vienna. Carinthia and Styria belong to south Austria. West Austria contains Upper Austria, Salzburg, Tyrol and Vorarlberg. Prices show an increasing tendency throughout the territory and across categories starting from 2018 especially in intermediate and rural areas, that can be related to second houses or people moving out of major big cities.

Chart showing the distribution of scaled prices per square metre over time and degree of urbanisation in Austria. The regions are categorised as East Austria (including Burgenland, Lower Austria, and Vienna), south Austria (comprising Carinthia and Styria), and west Austria (encompassing Upper Austria, Salzburg, Tyrol, and Vorarlberg). The data depicts an upward trend in prices across different areas from 2018, notably in intermediate and rural regions, possibly attributed to factors like second home purchases or population migration from major cities.
Figure 1. Distribution of scaled prices per square metre over time and degree of urbanisation in Austria. East Austria contains Burgenland, Lower Austria, Vienna. Carinthia and Styria belong to south Austria. West Austria contains Upper Austria, Salzburg, Tyrol and Vorarlberg. Prices show an increasing tendency throughout the territory and across categories starting from 2018 especially in intermediate and rural areas, that can be related to second houses or people moving out of major big cities.

Figure 2. Distribution of scaled prices per square metre over time and degree of urbanisation in Italy. Figure (a) contains the trend of the minimum values, while figure (b) represents the trend of the maximum values. Italy shows a general decreasing trend across all categories and regions with the notable exception of the northwest, where an increase for maximum prices for urban areas can be observed.

In Italy, a graphical representation showcasing the temporal and urbanisation degree trends in scaled prices per square metre. Figure (a) illustrates the minimum value trend, while Figure (b) displays the trend for maximum values. The country generally experiences a decreasing price trend across all categories and regions, except for the northwest, showing an increase in maximum prices specifically in urban areas.
Figure 2. Distribution of scaled prices per square metre over time and degree of urbanisation in Italy. Figure (a) contains the trend of the minimum values, while figure (b) represents the trend of the maximum values. Italy shows a general decreasing trend across all categories and regions with the notable exception of the northwest, where an increase for maximum prices for urban areas can be observed.

Figure 3. Distribution of scaled price per square metre over time and degree of urbanisation in France. Picture shows a general increase in prices starting from 2018 and 2019. Some areas appear as exceptions, such as Corsica and the Grand Est, where there was a slowdown in the last year. The rise in prices in French urban areas is not as noticeably different as the other two countries.

n France, a visual representation of the temporal and urbanisation degree trends in scaled prices per square metre. The image depicts a general increase in prices from 2018 and 2019, with exceptions noted in areas like Corsica and the Grand Est, displaying a recent slowdown. Notably, the rise in prices in French urban areas contrasts with the observed trends in other regions.
Figure 3. Distribution of scaled price per square metre over time and degree of urbanisation in France. Picture shows a general increase in prices starting from 2018 and 2019. Some areas appear as exceptions, such as Corsica and the Grand Est, where there was a slowdown in the last year. The rise in prices in French urban areas is not as noticeably different as the other two countries.

These results can already provide some preliminary insights on how housing market trends are affecting different areas and countries. For example, the general increase in the prices showed by Austria and France is in contrast with the price decrease affecting the Italian regions. However, a similar trend can be detected both in Italy and France in urban areas, following a depopulation phenomenon that has been affecting inner areas of those countries for some time now. In particular, the Italian data captures a further concentration of wealth and social capital in the northwest areas where the richest and most industrialised regions are located (Piedmont, Lombardy). The trends also highlight an existing territorial disparity among urban and rural areas that appear to be widening in recent years. Therefore, such quantitative evidence produced through our exploratory analysis can be used by policy-makers to support actions and initiatives addressing, for example, housing poverty and affordability in cities on one side and depopulation strategies in rural areas on the other side. It is important to notice how comparative analysis shows quite different trends among the three countries and therefore territorial policy must tailor its actions according to that in order to reach its objectives in a successful way.

3.2.2. Detecting spatial patterns for the European housing market

In the second part of the analysis, we focused more on understanding where in each country housing prices showed similar trends. Our objective was to identify patterns of similarity in the trends of price variation across all territorial areas included in the analysis. Understanding different territorial patterns in terms of housing price makes it possible to detect if areas associated to different degrees of urbanisation (cities, towns, rural areas) show similar price trends over the years or if some trends in housing prices are common to specific territorial characteristics (e.g., increase in cities, decrease in rural areas). This type of analysis also makes it possible to comparatively observe different trends in areas with similar territorial characteristics across different countries (Austria, Italy, France) and in relation to the price variation occurred in the latest years. For example, during the recent COVID-19 pandemic, several countries experienced a (temporary) flow of people settling outside of the cities in smaller towns and rural areas thanks to the possibility of performing remote working. The analysis we performed here, although at a larger geographical scale than individual cities, can signal if the phenomenon observed in some territories is common to areas with specific urbanisation characteristics (from big cities to rural areas) in specific countries, if the phenomenon had a visible impact on the housing market prices of those areas, and if such phenomenon is still ongoing or the trend reversed in the years following the pandemic peak. These are all relevant information on territorial trends that can be used by policy-makers to address, mitigate or promote certain demographic or economic growth phenomena to aim at a cohesive and sustainable territorial development.

To detect spatial patterns in the housing prices variations, we employed the time-series of temporal variation of price per square metre applying a cluster analysis to the datasets of each individual country. Differently from the previous analysis, here each time-series has been scaled to the maximum value in the series, to re-scale the data before calculating the distances among the different elements in the data set and to improve the performance of the clustering algorithm. After having tested different approaches,Footnote19 we selected k-means with Euclidean distance as it was the best combination to obtain clearer, better defined, clusters,Footnote20 possibly because our yearly time-series only consist of 6–7 points (due to data availability) of a quite homogeneous trend with not many variations (see exploratory analysis section above). To determine the optimal number of clusters in which data can be grouped, we applied the widely known elbow method. For Italy, a good compromise is obtained with 15 clusters; for France, the number of clusters can be 25. Results for the Austrian dataset are inconclusive, thus they are not included here. The outcome obtained from k-means is then displayed on a map to explore its spatial distribution. Each cluster is labelled with colours assigned sequentially. The purpose of this visualisation is to understand whether certain temporal trends are uniformly or non-uniformly distributed across the territory.

The cluster analysis for Italy confirms what was observed through the exploratory analysis. Most clusters identify groups of municipalities whose price (both minimum and maximum) show decreasing or stable price trends. However, this analysis makes it possible to isolate some interesting groups whose trend is instead an increase in housing prices (see clusters 2 and 12 of minimum values in and clusters 4, 8 and 14 of maximum values in ).

Figure 4. Clusters of minimum real estate values for Italian municipalities. The real estate values of each municipality are scaled with the highest value in the series. Most clusters identify groups of municipalities whose prices show decreasing or stable trends. However, this analysis makes it possible to isolate some interesting groups whose trend is instead an increase in housing prices (see clusters 2 and 12).

Visualisation depicting clusters of minimum real estate values across Italian municipalities. The values are scaled based on the highest value in the series. The clusters represent groups of municipalities with varying price trends, including decreasing or stable values, while highlighting specific clusters (e.g., clusters 2 and 12) indicating an increase in housing prices.
Figure 4. Clusters of minimum real estate values for Italian municipalities. The real estate values of each municipality are scaled with the highest value in the series. Most clusters identify groups of municipalities whose prices show decreasing or stable trends. However, this analysis makes it possible to isolate some interesting groups whose trend is instead an increase in housing prices (see clusters 2 and 12).

Figure 5. Clusters of maximum real estate values for Italian municipalities. The real estate values of each municipality are scaled with the highest value in the series. Most clusters identify groups of municipalities whose price trends show decreasing or stable price trends. However, this analysis makes it possible to isolate some interesting groups whose trend is instead an increase in housing prices (see clusters 4, 8 and 14).

Visualisation illustrating clusters of maximum real estate values among Italian municipalities. Each municipality's values are scaled based on the highest value in the series. Most clusters represent municipalities with decreasing or stable price trends, while highlighting specific clusters (e.g., clusters 4, 8 and 14) indicating an increase in housing prices.
Figure 5. Clusters of maximum real estate values for Italian municipalities. The real estate values of each municipality are scaled with the highest value in the series. Most clusters identify groups of municipalities whose price trends show decreasing or stable price trends. However, this analysis makes it possible to isolate some interesting groups whose trend is instead an increase in housing prices (see clusters 4, 8 and 14).

Useful information for the decision maker is to observe the degree of urbanisation of the municipalities within clusters considering Italy as a whole, as evidenced in . Selecting only municipalities with rising prices, we can see how the composition of the clusters (in percentage terms) is different compared to the global distribution (i.e., 3.5% urban, 33.5% intermediate and 63% rural). In almost all the growing clusters, the percentage of urban municipalities that are contained is double the national average. Moreover, the percentage of intermediate areas is also higher than the general figure. In this analysis, clusters are mapped to observe possible territorial aggregations, showing results of clustering on the minimum, as evidenced in and maximum values, as evidenced in . Blank areas are municipalities for which we have no data.

Figure 6. Geographical representation of clusters of minimum real estate values for Italian municipalities. Blank areas are municipalities for which we have no data. We recall that in central Italy the data collection was suspended due to a seismic event. Furthermore, on the island of Sardinia there are some missing data (possibly due to a change in the province administrative boundaries) that do not allow for complete time-series. The areas that are most aggregated refer to clusters 3, 5 and 8, that represent the majority of the nation and refer to a trend of stable prices.

Geographical representation displaying clusters of minimum real estate values among Italian municipalities. Blank areas indicate missing data; central Italy suspended data collection due to a seismic event. Additionally, incomplete data in Sardinia (potentially due to administrative boundary changes) hinder complete time-series analysis. The more aggregated areas (clusters 3, 5 and 8) cover a substantial portion of the nation, depicting a trend of stable prices.
Figure 6. Geographical representation of clusters of minimum real estate values for Italian municipalities. Blank areas are municipalities for which we have no data. We recall that in central Italy the data collection was suspended due to a seismic event. Furthermore, on the island of Sardinia there are some missing data (possibly due to a change in the province administrative boundaries) that do not allow for complete time-series. The areas that are most aggregated refer to clusters 3, 5 and 8, that represent the majority of the nation and refer to a trend of stable prices.

Figure 7. Geographical representation of clusters of maximum real estate values for Italian municipalities. Blank areas are municipalities for which we have no data. We recall that in central Italy the data collection was suspended due to a seismic event. Furthermore, on the island of Sardinia there are some missing data (possibly due to a change in the province’s administrative boundaries) that do not allow for a complete time-series. There are important areas of the country where prices have a similar (stable) trend. For example, the northeast area, the municipalities in Sardinia (cluster 5), and some areas of southern Italy (cluster 14).

Geographical depiction presenting clusters of maximum real estate values among Italian municipalities. Blank areas denote missing data, with central Italy suspending data collection due to a seismic event. Incomplete data in Sardinia (potentially due to administrative boundary changes) hinder complete time-series analysis. Key regions with similar stable price trends include the northeast area, municipalities in Sardinia (cluster 5), and certain southern Italian areas (cluster 14).
Figure 7. Geographical representation of clusters of maximum real estate values for Italian municipalities. Blank areas are municipalities for which we have no data. We recall that in central Italy the data collection was suspended due to a seismic event. Furthermore, on the island of Sardinia there are some missing data (possibly due to a change in the province’s administrative boundaries) that do not allow for a complete time-series. There are important areas of the country where prices have a similar (stable) trend. For example, the northeast area, the municipalities in Sardinia (cluster 5), and some areas of southern Italy (cluster 14).

Table 3. DEGURBA distribution within clusters for Italian municipality.

We recall that in central Italy, as described in the metadata, the data collection was suspended due to a seismic event. Furthermore, on the island of Sardinia there are some missing data (possibly due to a change in the province’s administrative boundaries) that do not allow us to have a complete time series. In , the areas that are most aggregated refer to clusters 3, 5 and 8. These areas represent the majority of the nation and refer to a trend of stable prices, as evidenced in . shows the map about the maximum values seems to show less homogeneity. However, there are important areas of the country where prices have a similar (stable) trend. For example the northeast area, the municipalities in Sardinia (cluster 5) and some areas of southern Italy (cluster 14).

The reasons for these differences must be investigated through further investigations with additional variables (e.g., socio-demographic, accessibility of the territories, broadband speed, etc.). Policy-makers can benefit from these insights and use information on price variations at municipal level to address pressing issues both in urban areas (social housing, housing poverty, access to credit, commuting and public transport) and rural areas (depopulation, ageing, access to service, etc.). Given the wide territorial distribution, an interactive tool could help policy-makers with a better exploration of the results.

For France, shows that there are very large clusters that capture upward price movements, which also confirms the results of the exploratory analysis. We can also observe some minor clusters that capture a fluctuating trend in the market. In the spatial distribution of the French clusters is shown. As noted above, there are several municipalities for which we have not been able to extract valid values. The map shows some patterns on the south coast of the country and several homogeneous areas in the west. In general, we can say that there are great fragmentations and heterogeneity in the distribution of real estate values and trends across the country.

Figure 8. Clusters of median real estate values for French municipalities. The real estate values of each municipality are scaled with the highest value in the series. The analysis shows that there are very large clusters that capture upward price movements (e.g., 1, 12 and 14), which also confirms the results of the exploratory analysis. We can also observe some minor clusters that capture a fluctuating trend in the market (e.g., 7, 17 and 24).

Visualisation presenting clusters of median real estate values among French municipalities, scaled based on the highest value in the series. The analysis highlights substantial clusters indicating upward price movements (e.g., 1, 12, 14), corroborating exploratory analysis results. Additionally, minor clusters (e.g., 7, 17, 24) capture a fluctuating trend in the market.
Figure 8. Clusters of median real estate values for French municipalities. The real estate values of each municipality are scaled with the highest value in the series. The analysis shows that there are very large clusters that capture upward price movements (e.g., 1, 12 and 14), which also confirms the results of the exploratory analysis. We can also observe some minor clusters that capture a fluctuating trend in the market (e.g., 7, 17 and 24).

Figure 9. Geographical representation of clusters of median real estate values for French municipalities. There are several municipalities for which we have not been able to extract valid values. The map shows some patterns on the south coast of the country and several homogeneous areas in the west where we have increasing in prices. In general, we can say that there is a great fragmentation and heterogeneity in the distribution of real estate values and trends across the country.

Geographical representation displaying clusters of median real estate values among French municipalities. Some municipalities lack valid extracted values. Patterns emerge along the South coast and homogeneous areas in the West, indicating price increases. Overall, the distribution of real estate values and trends across the country shows significant fragmentation and heterogeneity.
Figure 9. Geographical representation of clusters of median real estate values for French municipalities. There are several municipalities for which we have not been able to extract valid values. The map shows some patterns on the south coast of the country and several homogeneous areas in the west where we have increasing in prices. In general, we can say that there is a great fragmentation and heterogeneity in the distribution of real estate values and trends across the country.

Differently from the analysis performed on Italian data, these results appear to be inconclusive in terms of robustness of the clusters identified, possibly due to the large amount of missing information in the original dataset. However, the granularity of spatial information contained in this dataset allows for further analysis at the scale of individual cities, to possibly identify the variation of real estate values for specific contexts (e.g., cities or rural areas). This can produce additional information for policies specifically targeting urban areas (housing poverty, sustainable cities), and for cohesion policies addressing the territorial divide and disparities between urban and rural places.

4. DISCUSSION

As illustrated in Section 2.1, the main aim of this analysis was to highlight the potential for new quantitative evidence that could support the identification of targets and strategies for policy actions and implementation. Non-traditional data recently made available for territorial analysis have several advantages compared to more traditional sources of information and can be employed to complement results supporting policy-making. In the specific case of the analysis presented above, we employed higher resolution data sets on real estate values and, differently from previous work, we performed the analysis at the municipal level using several techniques, including machine learning. Results showed the potential for comparative analysis across different countries and what type of additional knowledge can be extracted to inform and support place-based policies, territorial strategy, and impact assessment at the local scale with quantitative evidence at very fine granular level.

The analysis presented here can be used as an example to be expanded with additional information and replicated with different data. Two aspects of this process are particularly relevant to mention.

Firstly, this exploratory analysis showed how using such disaggregated data, both at the spatial and temporal scale, can produce detailed information of urban and territorial trends that can meaningfully feed place-based policies targeting sustainable development and territorial cohesion. The data employed in this case allowed for analysis at the municipal level at the least (in the Austrian case) up to sub-municipal level (for Italian data) and neighbourhood level (for the French dataset). The high-resolution of spatial information obtained by the analysis allows policy-makers and decision-makers to identify and address problems in a sharp way, both in terms of places and residents targeted by their policy interventions. For example, the insights obtained regarding the decline of transaction prices in all Italian areas but the cities provide a quantitative evidence about socio-economic disparities across the Italian territory that manifest themselves as housing poverty in cities and as depopulation and ageing phenomena in the rural areas.

Secondly, the comparative analysis of similar data from different countries allows us to observe how the same phenomenon affects different territorial and social contexts, and makes it possible to identify areas that are particularly vulnerable to spatial inequality and marginalisation in terms of economic development and social inclusion. Comparing different case studies can also help to understand how the same issues are targeted differently across regions, and how best practices can be identified and applied where needed.

Overall, this work showed how employing emergent and highly disaggregated data to comparatively analyse territorial trends helps to avoid having a policy approach that fits all territories in the same way: rather it makes it possible to use robust quantitative evidence of spatial differences to target more specific tailored policy interventions when needed.

This work also highlighted some limitations one can encounter when using this type of data in relation to privacy and representativeness. For example, for privacy reasons, a minimum number of transactions for each municipality are required for a price to be listed in the Austrian data, in order to avoid re-identification of information. To avoid similar privacy issues, the general conditions of use of information in the French dataset prescribes that, on the one hand, the processing relating to the reuse of the information cannot have either the purpose or the effect of allowing the re-identification of persons concerned and, on the other hand, that this information cannot be subject to indexing on online search engines. These precautionary measures can have a direct impact on the quality of information contained in the data. In the case of Austria, if in a given year there is not a minimum number of transactions, the data are not reported. This problem is mainly present in small municipalities (rural zones) where the real estate market is not very active. This leads to problems of representativeness in the results, as less information is available for areas that are often less rich in terms of spatial information. Missing data can also be caused by other circumstances. In the Italian data, some provinces in the centre of ItalyFootnote21 show missing values where the quotations have been suspended following the interruption of market surveys due to the seismic events that have affected these provinces in recent years. Furthermore, the change of administrative boundaries across different years can lead to missing points in the data time-series (see the case of Sardinia).

5. CONCLUSIONS

The recent data revolution has dramatically changed (and still is changing) all fields of human experience. Scientific research, especially in the social science realm, has been infused with new life via the use of advanced analytical techniques, either inferential or simulative, as well as by employing novel and non-traditional data sources, from digital trace data to big administrative datasets. It was just a matter of time before such a novel discipline, called computational social science, triggered the attention of the policy-making world. Indeed, some research efforts, both theoretical and applied, have been focused towards the use of CSS techniques and approaches to solve policy issues. One of the most promising areas in this regard is the field of territorial impact assessment of policy measures, and in general the issues connected to policy-making on a regional scale. Such a policy area can indeed benefit greatly from the higher temporal and spatial resolution of non-traditional data sources with respect to traditional ones, as well as the increased timeliness of non-traditional data sources. Moreover, the variety of spatio-temporal information about several territorial activities and phenomena contributes to the semantic capabilities of any territorial analysis: combining innovative data with more traditional metrics can indeed contribute to the comprehension and communication of complex pressing issues, such as territorial disparities and depopulation.

To put this novel analytical paradigm to the test, we explored a case study on the use of innovative data for spatial policy. Namely, we explored data for three countries (Austria, France and Italy), with the long-term objective to infer intra-national mobility and residential patterns correlated with the onset of the COVID-19 pandemic. The complexities of the analysis – and its limitations – teach us a compelling story: despite the evident potential of this kind of analysis, a fundamental effort in the harmonisation of data sources is needed. Indeed, the three countries that were the subject of our study presented very different levels of pre-processing and aggregation of the data source (from the totally disaggregated dataset of France to the min–max approach of Italy), as well as different levels of ‘out-of-the-box’ usability of the information. For example, the housing dataset for France, despite being the most informative one, was also the hardest to interpret and to use in applied analysis, while the one for Italy, while being the cleanest and the easiest to process (and admittedly, the one with the most interpretable results), may hide in its preprocessing a more nuanced picture than the one provided. Indeed this is the typical case in which a ‘Trusted Smart Statistics’ approach, in which a common preprocessing algorithm is run on the source data, could ensure comparability of the results (on this topic, see e.g., Ricciato et al., Citation2019, Citation2020; Signorelli et al., Citation2022).

From an analytical point of view, the limitations identified in this exploratory analysis show the way for further work on the subject: combining real estate information with population data (age, income). Also, the link between real estate information with availability of services data (transport, amenities, etc.) or building age and energy consumption could help identify possible relationships with trends in house values, especially at city levels.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

The views expressed are purely those of the authors and may not in any circumstances be regarded as stating an official position of the European Commission.

Notes

2. Local administrative units.

3. Nomenclature of territorial units for statistics.

8. Statistics Austria cleans the data by removing transactions between family relatives, houses bought to be demolished, partial transactions, transactions where land and building component are not purchased together, as well as purchases by existing renters with a special form of contract (‘Mietkauf’, ‘rental purchase’). Then Statistics Austria creates a linear regression utilising municipality code, buildings size, age and some other attributes and remove data with Cooks-D (Kim & Storer, Citation1996) of more than 1.2 times the mean, thus removing many unrealistic combinations of prices and sizes.

11. This uniformity is translated into homogeneity of the market values of the real estate units included in an interval, with a difference between the minimum and maximum value generally not exceeding 50%. The minimum and maximum values represent the ordinariness, therefore, the prices referring to properties of particular value, or deterioration, or which in any case have non-ordinary characteristics for the building type of the area to which they belong, are not included in the interval.

13. Therefore, on the basis of what is written above, the methodology for processing the data collected points is to estimate the most probable interval in which the average falls instead of the average itself.

16. The dataset is provided as open data by the European Commission’s Global Human Settlement Layer project, see https://ghsl.jrc.ec.europa.eu/degurbaDefinitions.php.

17. Northwest contains Aosta Valley, Liguria, Lombardy, Piedmont. Northeast contains Emilia-Romagna, Friuli-Venezia Giulia, Trentino-Alto Adige/Südtirol, Veneto. Centre contains Lazio, Marche, Tuscany, Umbria. South contains Abruzzo, Apulia, Basilicata, Calabria, Campania, Molise. Finally, Islands contains Sardinia, Sicily.

18. Auvergne-Rhône-Alpes is a region in southeast-central France. The Bourgogne, Franche-Comté is a region in eastern France. Brittany region is a peninsula located in the west. Centre-Val de Loire straddles the middle Loire Valley in the interior of the country. Corsica is an island in the Mediterranean Sea. Grand-Est is an administrative region in northeastern France and contains Alsace, Champagne-Ardenne, Lorraine. Hauts-de-France is the northernmost region of France and contains Nord-Pas-de-Calais, Picardy. Normandy and contains the Lower Normandy and Upper Normandy regions. Nouvelle-Aquitain is the largest administrative region in France, spanning the west and southwest of the mainland. Occitanie is the southernmost administrative region of metropolitan France excluding Corsica. Pays de la Loire is in western France. Provence-Alpes-Côte d’Azur is the far southeast of the mainland. Île-de-France is also known as Paris region.

19. Firstly, we applied a density-based cluster technique with a customised distance metric optimal for time-series (dynamic time Warping, Sakoe & Chiba, Citation1978). The results highlighted that this combination was not optimal to obtain well-defined clusters. We then tested a combination of k-means (Tavenard et al., Citation2020) with the same customised metric. In this case, we obtained better results, with more defined clusters that are showing neater temporal profiles and sequences. However, results were not completely satisfactory yet, with fuzziness and inconsistencies in the similarities of temporal trends.

20. We are aware that the metric employed is not optimal for time-series as it does not detect any speed variation or alignment between otherwise similar temporal sequences.

21. Ascoli Piceno (AP), Macerata (MC), Aquila (AQ), Perugia (PG), Rieti (RI), Teramo (TE) and Modena (MO).

REFERENCES

  • Alessandretti, L., Aslak, U., & Lehmann, S. (2020). The scales of human mobility. Nature, 587(7834), 402–407. https://doi.org/10.1038/s41586-020-2909-1
  • Balemi, N., Füss, R., & Weigand, A. (2021). Covid-19’s impact on real estate markets: review and outlook. Financial Markets and Portfolio Management, 35, 495–513. https://doi.org/10.1007/s11408-021-00384-6
  • Beracha, E., & Wintoki, M. B. (2013). Forecasting residential real estate price changes from online search activity. Journal of Real Estate Research, 35(3), 283–312. https://doi.org/10.1080/10835547.2013.12091364
  • Bertoni, E., Fontana, M., Gabrielli, L., Signorelli, S., & Vespe, M. (Eds.). (2022a). Handbook of Computational Social Science for Policy. Springer. forthcoming.
  • Bertoni, E., Fontana, M., Gabrielli, L., Signorelli, S., & Vespe, M. (Eds.). (2022b). Mapping the demand side of computational social science for policy: harnessing digital trace data and computational methods to address societal challenges. LU: Publications Office of the European Union. https://doi.org/10.2760/901622
  • Calabrese, F., Pereira, F. C., Lorenzo, G. D., Liu, L., & Ratti, C. (2010). The Geography of Taste: Analyzing Cell-Phone Mobility and Social Events. In P. Floréen, A. Krüger, & M. Spasojevic (Eds.), Pervasive Computing. Pervasive 2010. Lecture Notes in Computer Science (vol. 6030). Springer. https://doi.org/10.1007/978-3-642-12654-3_2
  • Chyn, E., Hyman, J., & Kapustin, M. (2019). Housing voucher take-up and labor market impacts. Journal of Policy Analysis and Management, 38(1), 65–98. https://doi.org/10.1002/pam.22104
  • De Nadai, M., & Lepri, B. (2018, October 1–3). The economic value of neighborhoods: Predicting real estate prices from the urban environment. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (dsaa) (pp. 323–330).
  • Del Giudice, V., De Paola, P., & Del Giudice, F. P. (2020). Covid-19 infects real estate markets: Short and mid-run effects on housing prices in Campania region (Italy). Social Sciences, 9(7), 114. https://doi.org/10.3390/socsci9070114
  • Diamond, R., McQuade, T., & Qian, F. (2019). The effects of rent control expansion on tenants, landlords, and inequality: Evidence from San Francisco. American Economic Review, 109(9), 3365–3394. https://doi.org/10.1257/aer.20181289
  • D’Lima, W., Lopez, L. A., & Pradhan, A. (2022). Covid-19 and housing market effects: Evidence from us shutdown orders. Real Estate Economics, 50(2), 303–339. https://doi.org/10.1111/1540-6229.12368
  • Einav, L., & Levin, J. D. (2013, May). The data revolution and economic analysis (Working Paper No. 19035). National Bureau of Economic Research. Retrieved September 20, 2019, from http://www.nber.org/papers/w19035
  • Frias-Martinez, V., & Frias-Martinez, E. (2014). Spectral clustering for sensing urban land use using twitter activity. Engineering Applications of Artificial Intelligence, 35, 237–245. https://doi.org/10.1016/j.engappai.2014.06.019
  • Ge, J. (2013). Endogenous rise and collapse of housing prices.
  • Girardin, F., Calabrese, F., Fiore, F., Ratti, C., & Blat, J. (2008, October). Digital Footprinting: Uncovering Tourists with User-Generated Content. IEEE Pervasive Computing, 7(4), 36–43. https://doi.org/10.1109/MPRV.2008.71
  • González, M. C., Hidalgo, C. A., & Barabási, A.-L. (2008, June). Understanding individual human mobility patterns. Nature, 453(7196), 779–782. https://doi.org/10.1038/nature06958
  • Iacus, S. M., Santamaria, C., Sermi, F., Spyratos, S., Tarchi, D., & Vespe, M. (2021). Mobility functional areas and covid-19 spread. Transportation, 49, 1999–2025. https://doi.org/10.1007/s11116-021-10234-z.
  • Kandt, J., & Batty, M. (2021, February). Smart cities, big data and urban policy: Towards urban analytics for the long run. Cities, 109, 102992. https://doi.org/10.1016/j.cities.2020.102992
  • Kim, C., & Storer, B. E. (1996). Reference values for cook’s distance. Communications in Statistics-Simulation and Computation, 25(3), 691–708. https://doi.org/10.1080/03610919608813337
  • Kuchler, T., Piazzesi, M., & Stroebel, J. (2023). Housing market expectations. In R. Bachmann, G. Topa & W. van der Klaauw (Eds.), Handbook of economic expectations (pp. 163–191). Academic Press.
  • Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., & Van Alstyne, M. (2009, February). Computational Social Science. Science, 323(5915), 721–723. https://doi.org/10.1126/science.1167742
  • Lazer, D., Pentland, A., Watts, D. J., Aral, S., Athey, S., Contractor, N., Freelon, D., Gonzalez-Bailon, S., King, G., Margetts, H., Nelson, A., Salganik, M. J., Strohmaier, M., Vespignani, A., & Wagner, C. (2020, August). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060–1062. https://doi.org/10.1126/science.aaz8170
  • Li, X. (2021). Examining the spatial distribution and temporal change of the green view index in New York City using google street view images and deep learning. Environment and Planning B: Urban Analytics and City Science, 48(7), 2039–2054. https://doi.org/10.1177/2399808320962511
  • Medeiros, E. (2023). Data and modelling for the Territorial Impact Assessment (TIA) of policies. In E. Bertoni, M. Fontana, L. Gabrielli, S. Signorelli & M. Vespe (Eds.), Handbook of Computational Social Science for Policy (pp. 177–194). Springer.
  • Milunovich, G., & Trück, S. (2013). Regional and global contagion in real estate investment trusts: The case of the financial crisis of 2007–2009. Journal of Property Investment & Finance, 31(1), 53–77. https://doi.org/10.1108/14635781311292971
  • Miranda, A. S., Fan, Z., Duarte, F., & Ratti, C. (2021). Desirable streets: Using deviations in pedestrian trajectories to measure the value of the built environment. Computers, Environment and Urban Systems, 86, 101563. https://doi.org/10.1016/j.compenvurbsys.2020.101563
  • Moro, E., Calacci, D., Dong, X., & Pentland, A. (2021). Mobility patterns are associated with experienced income segregation in large us cities. Nature communications, 12(1), 1–10. https://doi.org/10.1038/s41467-020-20314-w
  • Moro, M. F., de Souza Mendonça, A. K., & de Andrade, D. F. (2022). Covid-19 pandemic accelerates the perception of digital transformation on real estate websites. Quality & Quantity, 1–17.
  • Pei, T., Sobolevsky, S., Ratti, C., Shaw, S.-L., Li, T., & Zhou, C. (2014). A new insight into land use classification based on aggregated mobile phone data. International Journal of Geographical Information Science, 28(9), 1988–2007. https://doi.org/10.1080/13658816.2014.913794
  • Proietti, P., Sulis, P., Perpina Castillo, C., & Lavalle, C. (Eds.). (2022). New perspectives on territorial disparities (No. EUR31025 EN). Publications Office of the European Union.
  • Quercia, D., Schifanella, R., & Aiello, L. M. (2014, 1–4 September). The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. In Proceedings of the 25th ACM Conference on Hypertext and Social Media (pp. 116–125).
  • Ricciato, F., Wirthmann, A., Giannakouris, K., Reis, F., & Skaliotis, M. (2019). Trusted smart statistics: Motivations and principles. Statistical Journal of the IAOS, 35(4), 589–603. https://doi.org/10.3233/SJI-190584
  • Ricciato, F., Wirthmann, A., & Hahn, M. (2020). Trusted smart statistics: How new data will change official statistics. Data & Policy, 2, e7. https://doi.org/10.1017/dap.2020.7
  • Robinson, C., & Franklin, R. S. (2020, November). The sensor desert quandary: What does it mean (not) to count in the smart city? Transactions of the Institute of British Geographers, tran.12415. Retrieved February 23, 2021, from https://onlinelibrary.wiley.com/doi/10.1111tran.12415
  • Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing, 26(1), 43–49. https://doi.org/10.1109/TASSP.1978.1163055
  • Samadani, S., & Costa, C. J. (2021, 23–26 June). Forecasting real estate prices in Portugal: A data science approach. In 2021 16th Iberian Conference on Information Systems and Technologies (cisti) (pp. 1–6).
  • Samardzhiev, K., Fleischmann, M., Arribas-Bel, D., Calafiore, A., & Rowe, F. (2022). Functional signatures in great Britain: A dataset. Data in Brief, 43, 108335. https://doi.org/10.1016/j.dib.2022.108335
  • Schläpfer, M., Dong, L., O’Keeffe, K., Santi, P., Szell, M., Salat, H., Anklesaria, S., Vazifeh, M., Ratti, C., & West, G. B. (2021). The universal visitation law of human mobility. Nature, 593(7860), 522–527. https://doi.org/10.1038/s41586-021-03480-9
  • Schuetz, J., Meltzer, R., & Been, V. (2009). 31 flavors of inclusionary zoning: Comparing policies from San Francisco, Washington, DC, and suburban Boston. Journal of the American Planning Association, 75(4), 441–456. https://doi.org/10.1080/01944360903146806
  • Seiferling, I., Naik, N., Ratti, C., & Proulx, R. (2017). Green streets quantifying and mapping urban trees with street-level imagery and computer vision. Landscape and Urban Planning, 165, 93–101. https://doi.org/10.1016/j.landurbplan.2017.05.010
  • Signorelli, S., Fontana, M., Gabrielli, L., & Vespe, M. (2022). Challenges and opportunities of computational social science for official statistics. Retrieved July 29, 2022, from https://arxiv.org/abs/2207.13508 (Publisher: arXiv Version Number: 1).
  • Singleton, A., Arribas-Bel, D., Murray, J., & Fleischmann, M. (2022). Estimating generalized measures of local neighbourhood context from multispectral satellite images using a convolutional neural network. Computers, Environment and Urban Systems, 95, 101802. https://doi.org/10.1016/j.compenvurbsys.2022.101802
  • Sornette, D., & Woodard, R. (2010). Financial bubbles, real estate bubbles, derivative bubbles, and the financial and economic crisis. In M. Takayasu, T. Watanabe & H. Takayasu (Eds.), Econophysics approaches to large-scale business data and financial crisis (pp. 101–148). Springer.
  • Spyratos, S., Vespe, M., Natale, F., Weber, I., Zagheni, E., & Rango, M. (2019). Quantifying international human mobility patterns using Facebook network data. PLoS One, 14(10), e0224134. https://doi.org/10.1371/journal.pone.0224134
  • Taşcιlar, M., & Arslanlι, K. Y. (2022). Forecasting commercial real estate indicators under Covid-19 by adopting human activity using social big data. Asia-Pacific Journal of Regional Science, 6(3), 1111–1132. https://doi.org/10.1007/s41685-022-00254-7
  • Tavenard, R., Faouzi, J., Vandewiele, G., Divo, F., Androz, G., Holtz, C., Payne, M., Yurchak, R., Rußwurm, M., Kolar, K., & Woods, E. (2020). Tslearn, a machine learning toolkit for time series data. Journal of Machine Learning Research, 21(118), 1–6. http://jmlr.org/papers/v21/20-091.html
  • Vespe, M., Iacus, S. M., Santamaria, C., Sermi, F., & Spyratos, S. (2021). On the use of data from multiple mobile network operators in Europe to fight covid-19. Data & Policy, 3. https://doi.org/10.1017/dap.2021.9
  • Voicu, I., & Been, V. (2008). The effect of community gardens on neighboring property values. Real Estate Economics, 36(2), 241–283. https://doi.org/10.1111/j.1540-6229.2008.00213.x
  • Wong, G. (2008). Has Sars infected the property market? Evidence from Hong Kong. Journal of Urban Economics, 63(1), 74–95. https://doi.org/10.1016/j.jue.2006.12.007
  • Zhong, C., Arisona, S. M., Huang, X., Batty, M., & Schmitt, G. (2014, May). Detecting the dynamics of urban structure through spatial network analysis. International Journal of Geographical Information Science, 1–22. https://doi.org/10.1080/13658816.2014.914521
  • Zhong, C., Huang, X., Müller Arisona, S., Schmitt, G., & Batty, M. (2014, November). Inferring building functions from a probabilistic model using public transportation data. Computers, Environment and Urban Systems, 48, 124–137. https://doi.org/10.1016/j.compenvurbsys.2014.07.004