Session Title

Analyzing Virtual Reference Transcripts with Machine Learning

Description

Using open-source machine learning packages, it is possible for librarians to process, analyze, and leverage increasingly large sets of unstructured data. When the appropriate tools and techniques are applied, virtual reference transcripts represent a treasure trove of unstructured data that can provide librarians with information about patrons' needs and insights about how to manage library services. Our research project is ongoing, but so far we have successfully taken a data set consisting of approximately 15,000 transcripts, 3 million words, and 100,000 unique tokens, and have developed a model for automatically processing transcripts, drawing out latent topics using unsupervised learning methods, and clustering transcripts into intuitive groupings. Based on early results, we have identified a number of potential applications for the model with respect to assessing and managing libraries' public services. In particular, librarians and library managers can use these types of machine learning models to improve their staff training processes, challenge underlying assumptions about how patrons engage with library services, and provide an 'early warning' mechanism when a library sees a sudden spike or change in the ways patrons engage with library services.

Start Date

20-3-2019 10:30 AM

End Date

20-3-2019 11:30 AM

This document is currently not available here.

Share

COinS
 
Mar 20th, 10:30 AM Mar 20th, 11:30 AM

Analyzing Virtual Reference Transcripts with Machine Learning

Using open-source machine learning packages, it is possible for librarians to process, analyze, and leverage increasingly large sets of unstructured data. When the appropriate tools and techniques are applied, virtual reference transcripts represent a treasure trove of unstructured data that can provide librarians with information about patrons' needs and insights about how to manage library services. Our research project is ongoing, but so far we have successfully taken a data set consisting of approximately 15,000 transcripts, 3 million words, and 100,000 unique tokens, and have developed a model for automatically processing transcripts, drawing out latent topics using unsupervised learning methods, and clustering transcripts into intuitive groupings. Based on early results, we have identified a number of potential applications for the model with respect to assessing and managing libraries' public services. In particular, librarians and library managers can use these types of machine learning models to improve their staff training processes, challenge underlying assumptions about how patrons engage with library services, and provide an 'early warning' mechanism when a library sees a sudden spike or change in the ways patrons engage with library services.