Search in:

Behavioral & Social Sciences Librarian Volume 36, 2017 - Issue 3

Journal homepage

Free access

1,543

Views

CrossRef citations to date

Altmetric

Listen

Editorials

Internet Connection: AI and Libraries: Supporting Machine Learning Work

Robin Camille DavisUser Experience Librarian, North Carolina State University Libraries, D.H. Hill Jr. LibraryCorrespondence[email protected]

Pages 109-112 | Published online: 22 Oct 2020

Cite this article
https://doi.org/10.1080/01639269.2017.1771046
CrossMark

In this article

What is AI? What is machine learning?
What is needed to work with AI?
What is AI doing in academic libraries?
Supporting machine learning projects in the library
Providing access to data workstations
Organizing events, workshops, and training
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF

Artificial intelligence (AI) has been in the news frequently in the past few years. It seems as though every new product has some kind of “AI” at its core. Even library conferences seem overtaken by AI topics, with session titles like “AI is Such a Tool: Keeping Your Machine Learning Outputs in Check” and “Humans vs. Robots: What professional skills do students need for success in an AI world?” (Averkamp and Hardesty Citation2020; Hoeppner and Adams Citation2020). But what is AI? What’s it doing in libraries? And how can libraries support patrons doing work with AI?

What is AI? What is machine learning?

Artificial intelligence is a field of study that includes machine learning, and the two are often conflated. The goal of machine learning, or AI, is a prediction or an inference based on patterns in data. Machine learning involves building an input and output system where the input is data and the output is a guess about the data. The input could be a thousand novels, and the output could be guessing what genre each novel is based on word choices. The input could be a spreadsheet of consumers’ purchases, and the output could be guessing what the consumer will buy next based on their purchase histories.

(A note on terminology: I’m of the opinion that we haven’t actually achieved true unlimited, independent artificial intelligence yet. Most products that tout “AI” are using machine learning, which is a less buzzier but more specific term, and using it in a very specific, limited context. I’ll be using the terms machine learning and artificial intelligence somewhat interchangeably in this column, but giving preference to the former.)

In daily life, machine learning is being used to guess the next word of the phrase you’re typing into a search engine, make your smartphone’s battery life last longer by optimizing background processes, and put together an intelligent-sounding reply when you ask your smart speaker about the weather this weekend. I’m taking advantage of a machine learning product right now to save myself time by using AI-powered voice dictation software that outputs a mostly accurate a typed document. (Don’t worry, I’ve gone back and edited out the bits where I said data and it typed out pasta.) Like anything technological, machine learning can be used for a multitude of purposes that can improve our lives —or negatively impact them. For example, some states use machine learning software in the courthouse to determine recidivism risk and prison sentence length; this software has been found to produce inaccurate and potentially biased outputs (Angwin and Larson Citation2016; O’Neil Citation2017). With such a wide range of applications that can affect our daily lives, it’s no wonder that AI is a topic cropping up in research spaces, particularly on the heels of the rise of big data.

What is needed to work with AI?

To build a system that uses machine learning requires an awful lot of data as input. The amount of data that machine learning takes can quickly approach “big data,” which a programing professor I had once defined as “too big for a typical laptop.” What kind of data can machine learning use as input? Data can be numbers, text, images, audio, and video. Input could be a hundred text documents, two thousand video files, a CSV with 10 million rows, or any similar large set of information.

One reason that machine learning requires so much data is the “training” convention for building a machine learning system. (Technical content warning for this paragraph.) Let’s say you’re building a system that will recognize stoplights visually, and your input is images of city streets. When you’re working on your machine learning model (the basis of your system), you typically start working with a “chunk” of your total dataset—say, four sets of two hundred images out of the one thousand total images. By using smaller “slices” of the dataset as input, and making changes so the output is better, you’re “training” your machine learning model. When you’re happy with it, you’ll put in the remaining two hundred images to “test” it. If the machine learning model takes the images of city streets it’s never seen before and outputs correct identification of the streetlights in the images, you can feel confident that your machine learning model is ready to be used.

In addition to data, building an AI system will likely require using a framework or software library (bundle of reusable code) that other people have built. Popular open-source software libraries for machine learning include Scikit-learn and TensorFlow. On a larger scale, one might take advantage of AI tools offered by Google, Amazon, IBM, and the like. To top off this overview of what building AI requires, one would need a high-powered computer and either a great deal of expertise or a great deal of patience to slog through documentation.

What is AI doing in academic libraries?

You have probably noticed AI and machine learning popping up in library literature, conferences, and events. Machine learning can be used for search projects, such as HAMLET, an experimental discovery interface for MIT theses (Yelton Citation2017). Machine learning can also be used in image analysis, as is the case for Aida, which analyzes digitized historical materials (Lorang and Soh Citation2019). Colleagues at my institution, NC State University Libraries, are experimenting with using machine learning to analyze space usage (Beswick and Davidson Citation2019). There are many opportunities for academic libraries to benefit from large-scale data projects powered by machine learning (Griffey Citation2019).

But perhaps the place where machine learning and AI are being used the most is in the work of our patrons. Computer science, engineering, linguistics, biotech, and digital humanities are some of the disciplines where machine learning work is becoming more common. The students and faculty in these disciplines need datasets, high-powered computers, and expert guidance in order to complete machine learning tasks and projects. One way that libraries can participate in the emerging trend of AI is by supporting academic work in machine learning.

Supporting machine learning projects in the library

Providing data

Let’s say a patron needs 10,000 newspaper articles for a machine learning project. Where would they look? It’s not easy to find such a dataset, particularly one that’s free or licensed for use and in a format suitable for the patron’s project. This is where a library can serve patrons doing machine learning work: providing access to datasets.

Libraries can license some datasets for a fee, just like obtaining a license for database access. The Linguistic Data Consortium Dataset, for instance, can be licensed by an institution. Libraries can also help patrons find data by creating a list of datasets that are available by individual request, often at no cost. The owners of these datasets will examine requests on a case-by-case basis and decide whether or not to allow the requestor access to the data. For example, individual researchers can request access to Affective Norms for English Words, a dataset used for sentiment analysis. Finally, libraries can also provide patrons with a list of freely available datasets. For instance, the patron who needs 10,000 newspaper articles might be interested in the corpus of Reuters articles included in the Natural Language Toolkit, a free and open-source Python library.

What about creating a dataset for researchers to use? Datasets can be challenging to find, and researchers are always looking for datasets that fit their project requirements. Libraries are storehouses of text, images, and other data sources. Finding, cleaning, and documenting data, as well as making a dataset available, are skills that some academic librarians may have in spades. Partnering with other researchers to publish datasets is yet another way that libraries can support machine learning work.

Providing access to data workstations

Once the patron has access to data, they’ll need somewhere to work with it. They may have a suitable laptop or desktop computer of their own, but quite often, working with so much data requires a high-powered computer. A computing task that might take 3 hours to complete on a typical laptop may take just 30 minutes on one of these high-powered computers. Moreover, these workstations could be pre-loaded with software that creates an ideal environment for machine learning work. A model for this kind of researcher-centered space can be found at my current institution, NC State University Libraries. Patrons can use high-powered desktop workstations that are ideal for data work in the Dataspace and the upcoming Data Experience Lab, both of which were designed to meet students’ advanced computing needs (Ciccone Citation2018). These spaces are staffed with students and librarians who have expertise in working with data and various software programs. Even if your library cannot provide data workstations like these, there are other ways to do machine learning work in online environments, such as Google Colab and Jupyter Notebooks. These tools put the onus of computing power on a cloud-based computer, and while their usage may be limited, it can be a great place to experiment with machine learning.

Organizing events, workshops, and training

Programs about AI position the library as a place where emerging technologies are explored and critiqued. These events are also opportunities to partner with departments like computer science and linguistics. If your library staff has expertise in data science, text mining, machine learning, or AI, providing office hours and consultations specifically dedicated to these topics can also signal how your library supports emerging research. And if your library doesn’t have this expertise (yet), you can still benefit patrons who are doing machine learning work by holding related workshops that are squarely in the realm of library knowledge, such as a workshop focused on ethics, privacy, and AI.

Getting an overview of machine learning and AI is a good professional development opportunity for librarians. Academic libraries should support staff who attend workshops, conference sessions, and webinars around AI and machine learning. It is important to have some familiarity with these emerging technologies. Even if you have no interest in inventing a self-driving book cart, a little knowledge goes a long way toward supporting patrons doing machine learning work.

Robin Camille Davis
User Experience Librarian, North Carolina State University Libraries, D.H. Hill Jr. Library http://orcid.org/0000-0002-0548-0803

References

Angwin, J., and J. Larson. 2016. Machine bias. ProPublica. May 23. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Google Scholar
Averkamp, S., and J. Hardesty. 2020. AI is such a tool: keeping your machine learning outputs in check. Presented at the Code4Lib, Pittsburgh, PA, March 11. https://2020.code4lib.org/talks/AI-is-such-a-tool-Keeping-your-machine-learning-outputs-in-check.
Google Scholar
Beswick, K., and B. Davidson. 2019. Taking the plunge: deep learning in libraries. Presented at the Triangle Research Libraries Network Annual Meeting, Chapel Hill, NC, July 11.
Google Scholar
Ciccone, K. 2018. Data science and visualization space user research. NC State University Libraries. https://www.lib.ncsu.edu/projects/data-science-and-visualization-space-user-research.
Google Scholar
Griffey, J. 2019. Artificial intelligence and machine learning in libraries. Library Technology Reports 55 (1):1–29.
Google Scholar
Hoeppner, A., and M. Adams. 2020. Humans vs. robots: What professional skills do students need for success in an ai world? A discussion on the digital knowledge, skills, and abilities that will have the most value in the rapidly changing business landscape. Presented at the Electronic Resources & Libraries Annual Conference, Austin, TX, March 11. https://2020erl.sched.com/event/XVfj/s82-humans-vs-robots-what-professional-skills-do-students-need-for-success-in-an-ai-world-a-discussion-on-the-digital-knowledge-skills-and-abilities-that-will-have-the-most-value-in-the-rapidly-changing-business-landscape.
Google Scholar
Lorang, E., and L.-K. Soh. 2019. Image analysis for archival discovery (Aida). http://projectaida.org/.
Google Scholar
O’Neil, C. 2017. Weapons of math destruction: How big data increases inequality and threatens democracy. London, UK: Penguin Books.
Google Scholar
Yelton, A. 2017. How about machine learning enhancing theses? HAMLET. https://hamlet.andromedayelton.com/.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Internet Connection: AI and Libraries: Supporting Machine Learning Work

What is AI? What is machine learning?

What is needed to work with AI?

What is AI doing in academic libraries?

Supporting machine learning projects in the library

Providing data

Providing access to data workstations

Organizing events, workshops, and training

References

Information for

Open access

Opportunities

Help and information

Internet Connection: AI and Libraries: Supporting Machine Learning Work

What is AI? What is machine learning?

What is needed to work with AI?

What is AI doing in academic libraries?

Supporting machine learning projects in the library

Providing data

Providing access to data workstations

Organizing events, workshops, and training

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature