1,543
Views
3
CrossRef citations to date
0
Altmetric
Editorials

Internet Connection: AI and Libraries: Supporting Machine Learning Work

Artificial intelligence (AI) has been in the news frequently in the past few years. It seems as though every new product has some kind of “AI” at its core. Even library conferences seem overtaken by AI topics, with session titles like “AI is Such a Tool: Keeping Your Machine Learning Outputs in Check” and “Humans vs. Robots: What professional skills do students need for success in an AI world?” (Averkamp and Hardesty Citation2020; Hoeppner and Adams Citation2020). But what is AI? What’s it doing in libraries? And how can libraries support patrons doing work with AI?

What is AI? What is machine learning?

Artificial intelligence is a field of study that includes machine learning, and the two are often conflated. The goal of machine learning, or AI, is a prediction or an inference based on patterns in data. Machine learning involves building an input and output system where the input is data and the output is a guess about the data. The input could be a thousand novels, and the output could be guessing what genre each novel is based on word choices. The input could be a spreadsheet of consumers’ purchases, and the output could be guessing what the consumer will buy next based on their purchase histories.

(A note on terminology: I’m of the opinion that we haven’t actually achieved true unlimited, independent artificial intelligence yet. Most products that tout “AI” are using machine learning, which is a less buzzier but more specific term, and using it in a very specific, limited context. I’ll be using the terms machine learning and artificial intelligence somewhat interchangeably in this column, but giving preference to the former.)

In daily life, machine learning is being used to guess the next word of the phrase you’re typing into a search engine, make your smartphone’s battery life last longer by optimizing background processes, and put together an intelligent-sounding reply when you ask your smart speaker about the weather this weekend. I’m taking advantage of a machine learning product right now to save myself time by using AI-powered voice dictation software that outputs a mostly accurate a typed document. (Don’t worry, I’ve gone back and edited out the bits where I said data and it typed out pasta.) Like anything technological, machine learning can be used for a multitude of purposes that can improve our lives —or negatively impact them. For example, some states use machine learning software in the courthouse to determine recidivism risk and prison sentence length; this software has been found to produce inaccurate and potentially biased outputs (Angwin and Larson Citation2016; O’Neil Citation2017). With such a wide range of applications that can affect our daily lives, it’s no wonder that AI is a topic cropping up in research spaces, particularly on the heels of the rise of big data.

What is needed to work with AI?

To build a system that uses machine learning requires an awful lot of data as input. The amount of data that machine learning takes can quickly approach “big data,” which a programing professor I had once defined as “too big for a typical laptop.” What kind of data can machine learning use as input? Data can be numbers, text, images, audio, and video. Input could be a hundred text documents, two thousand video files, a CSV with 10 million rows, or any similar large set of information.

One reason that machine learning requires so much data is the “training” convention for building a machine learning system. (Technical content warning for this paragraph.) Let’s say you’re building a system that will recognize stoplights visually, and your input is images of city streets. When you’re working on your machine learning model (the basis of your system), you typically start working with a “chunk” of your total dataset—say, four sets of two hundred images out of the one thousand total images. By using smaller “slices” of the dataset as input, and making changes so the output is better, you’re “training” your machine learning model. When you’re happy with it, you’ll put in the remaining two hundred images to “test” it. If the machine learning model takes the images of city streets it’s never seen before and outputs correct identification of the streetlights in the images, you can feel confident that your machine learning model is ready to be used.

In addition to data, building an AI system will likely require using a framework or software library (bundle of reusable code) that other people have built. Popular open-source software libraries for machine learning include Scikit-learn and TensorFlow. On a larger scale, one might take advantage of AI tools offered by Google, Amazon, IBM, and the like. To top off this overview of what building AI requires, one would need a high-powered computer and either a great deal of expertise or a great deal of patience to slog through documentation.

What is AI doing in academic libraries?

You have probably noticed AI and machine learning popping up in library literature, conferences, and events. Machine learning can be used for search projects, such as HAMLET, an experimental discovery interface for MIT theses (Yelton Citation2017). Machine learning can also be used in image analysis, as is the case for Aida, which analyzes digitized historical materials (Lorang and Soh Citation2019). Colleagues at my institution, NC State University Libraries, are experimenting with using machine learning to analyze space usage (Beswick and Davidson Citation2019). There are many opportunities for academic libraries to benefit from large-scale data projects powered by machine learning (Griffey Citation2019).

But perhaps the place where machine learning and AI are being used the most is in the work of our patrons. Computer science, engineering, linguistics, biotech, and digital humanities are some of the disciplines where machine learning work is becoming more common. The students and faculty in these disciplines need datasets, high-powered computers, and expert guidance in order to complete machine learning tasks and projects. One way that libraries can participate in the emerging trend of AI is by supporting academic work in machine learning.

Supporting machine learning projects in the library

Providing data

Let’s say a patron needs 10,000 newspaper articles for a machine learning project. Where would they look? It’s not easy to find such a dataset, particularly one that’s free or licensed for use and in a format suitable for the patron’s project. This is where a library can serve patrons doing machine learning work: providing access to datasets.

Libraries can license some datasets for a fee, just like obtaining a license for database access. The Linguistic Data Consortium Dataset, for instance, can be licensed by an institution. Libraries can also help patrons find data by creating a list of datasets that are available by individual request, often at no cost. The owners of these datasets will examine requests on a case-by-case basis and decide whether or not to allow the requestor access to the data. For example, individual researchers can request access to Affective Norms for English Words, a dataset used for sentiment analysis. Finally, libraries can also provide patrons with a list of freely available datasets. For instance, the patron who needs 10,000 newspaper articles might be interested in the corpus of Reuters articles included in the Natural Language Toolkit, a free and open-source Python library.

What about creating a dataset for researchers to use? Datasets can be challenging to find, and researchers are always looking for datasets that fit their project requirements. Libraries are storehouses of text, images, and other data sources. Finding, cleaning, and documenting data, as well as making a dataset available, are skills that some academic librarians may have in spades. Partnering with other researchers to publish datasets is yet another way that libraries can support machine learning work.

Providing access to data workstations

Once the patron has access to data, they’ll need somewhere to work with it. They may have a suitable laptop or desktop computer of their own, but quite often, working with so much data requires a high-powered computer. A computing task that might take 3 hours to complete on a typical laptop may take just 30 minutes on one of these high-powered computers. Moreover, these workstations could be pre-loaded with software that creates an ideal environment for machine learning work. A model for this kind of researcher-centered space can be found at my current institution, NC State University Libraries. Patrons can use high-powered desktop workstations that are ideal for data work in the Dataspace and the upcoming Data Experience Lab, both of which were designed to meet students’ advanced computing needs (Ciccone Citation2018). These spaces are staffed with students and librarians who have expertise in working with data and various software programs. Even if your library cannot provide data workstations like these, there are other ways to do machine learning work in online environments, such as Google Colab and Jupyter Notebooks. These tools put the onus of computing power on a cloud-based computer, and while their usage may be limited, it can be a great place to experiment with machine learning.

Organizing events, workshops, and training

Programs about AI position the library as a place where emerging technologies are explored and critiqued. These events are also opportunities to partner with departments like computer science and linguistics. If your library staff has expertise in data science, text mining, machine learning, or AI, providing office hours and consultations specifically dedicated to these topics can also signal how your library supports emerging research. And if your library doesn’t have this expertise (yet), you can still benefit patrons who are doing machine learning work by holding related workshops that are squarely in the realm of library knowledge, such as a workshop focused on ethics, privacy, and AI.

Getting an overview of machine learning and AI is a good professional development opportunity for librarians. Academic libraries should support staff who attend workshops, conference sessions, and webinars around AI and machine learning. It is important to have some familiarity with these emerging technologies. Even if you have no interest in inventing a self-driving book cart, a little knowledge goes a long way toward supporting patrons doing machine learning work.

Robin Camille Davis
User Experience Librarian, North Carolina State University Libraries, D.H. Hill Jr. Library http://orcid.org/0000-0002-0548-0803

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.