125
Views
0
CrossRef citations to date
0
Altmetric
Book Reviews

Hands-On Data Science for Librarians

Sarah Lin and Dorris Scott, Boca Raton, FL: CRC Press, 2023, 200 pp., $69.95, ISBN: 9781003218012

ORCID Icon &

On the one hand, librarians are adept and comfortable with storing, using, and analyzing their collections or data; on the other hand, they are not known for their coding skills. This book aims to address this issue. It aims to equip librarians with essential skills for interpreting visual data, including web scraping, map analysis, interactive report generation, and machine learning techniques. Despite the abundance of resources for learning R scattered across the internet, librarians from diverse backgrounds may struggle to find a comprehensive guide specifically focused on mastering coding skills for excelling in library sciences.

By explaining the meaning of data science and its importance, as well as elucidating the possibilities and benefits of introducing coding in library sciences, the authors beautifully set the tone of the book in its first chapter, Introduction.

Thereafter, in Using RStudio’s IDE, they explain the installation of RStudio, the functions of each pane in the IDE, customization of the IDE settings, and the use of the IDE to import a tabular data file. RStudio’s Integrated Development Environment (IDE) is a popular choice among R users for its comprehensive set of features designed to enhance the R programming experience.

In Tidying Data with dplyr, loading the dplyr package, a brief summary of the most common dplyr functions, and using the dplyr functions to normalize fields in a dataset are discussed. dplyr is an R package designed for data manipulation tasks, providing a set of functions optimized for efficiently working with data frames and data tables. Developed by Hadley Wickham and the RStudio team, dplyr offers a concise and intuitive grammar for data manipulation, making it a popular choice among R users for tasks such as filtering, selecting, summarizing, mutating data, and arranging data.

Visualizing YourProject with ggplot2 describes the various geom functions in ggplot2 for making plots. This chapter also demonstrates how to load the ggplot2 package and other relevant libraries into RStudio using appropriate code. ggplot2 is a powerful data visualization package in R, created by Hadley Wickham. It is based on the grammar of graphics, which provides a consistent and flexible framework for constructing plots. ggplot2 allows users to create complex plots with relatively simple code, making it popular among data analysts and scientists.

Webscraping with rvest explains how to determine which HTML tags will provide the desired data. It also explains how the rvest package can be used to scrape webpage content into the RStudio IDE. rvest is an R package designed for web scraping. It allows users to extract data from HTML web pages using CSS selectors and XPath expressions. Web scraping involves extracting data from websites, which can then be used for analysis, visualization, or other purposes.

Mapping with tmap starts with defining the geographic concepts (such as the geographic coordinate system, projected coordinate system, datum, and spheroid) that are relevant to cartography. It also describes vector and raster data. Interestingly, tmap may also be employed to achieve a data transformation of spatial data into an appropriate projection. tmap is an R package for creating thematic maps, which are maps that represent spatial data by visualizing thematic variables. Thematic maps are useful for showing distribution, patterns, and relationships of spatial phenomena.

Textual Analysis with tidytext explains how the tidytext package can be used to load textual dataset into the IDE. It also shows how a sentiment analysis of a textual dataset can be done using it. Using the tf-idf functions to identify a text’s most common words is also discussed. tidytext is an R package that provides tools for text mining and natural language processing (NLP) tasks within the framework of the tidyverse. It facilitates the process of tidying and analyzing text data by providing functions for tokenization, stemming, sentiment analysis, and other common text processing tasks.

Creating Dynamic Documents with rmarkdown introduces the R Markdown package that allows the user to create various documents including PDFs, MS Word documents, and HTML files, as well as the creation of websites, presentations, and dashboards. rmarkdown is an R package that enables dynamic document generation within the R environment. It allows you to create documents that weave together narrative text, code chunks, and the output of that code, such as tables and plots, into a single, reproducible document.

Creating a Flexdashboard is all about Flexdashboards, a package in R that allows users to create interactive dashboards using R Markdown. It is built on top of the R Markdown framework and provides a way to easily layout visualizations and narratives in a single, flexible dashboard format.

Shiny is an R package that enables the creation of interactive web applications directly from R. It allows R users to build web applications with complex user interfaces (UIs) and interactive visualizations without needing to know HTML, CSS, or JavaScript. In Creating an Interactive Dashboard with Shiny, Lin and Scott enlist the basic components of a Shiny app. They also shed light on reactivity in Shiny apps and the generation of codes in the Shiny user interface/server function to access the provided data.

In Using tidymodels to Understand Machine Learning, there is a detailed description of the ways text mining is utilized in machine learning algorithms. This chapter also describes the uses of machine learning related to employment and ends with a discussion on how one could identify areas of potential bias in machine learning.

In Conclusion, the authors recap the various concepts discussed in the 11 chapters of the book. They also point out that the book is not the end to learning R. “This book focused on the breadth of R, not the depth.” For those who want to embark on their next journey in R after having gone through this book are recommended to go through the seminal text R for Data Science by Garrett Grolemund and Hadley Wickham.

The book claims to be currently the only data science book geared toward librarians that includes step-by-step coding examples applicable for public, academic, and special library types. Efforts have been made to make the book accessible to a wider audience by reducing technical jargon, providing job skills, and fostering empowerment and confidence among readers.

Overall, the appearance will offer librarians fresh and thrilling perspectives within their field, leading to novel avenues of exploration and deepening one’s grasp of library sciences. Additionally, it will shed light on how data science can infuse libraries with added significance and allure.

Firdous Ahmad Mala
Amar Singh College, Cluster University Srinagar, Jammu and Kashmir, India
[email protected]
Snowbar Majeed
AAAM Degree College Bemina, Jammu and Kashmir, India

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.