More and more of our culture and social interactions are translated into machine-readable data. The massive quantities of rich social, cultural, economic, political and historical data offer practitioners the opportunity to analyze user-generated content (UGC) to measure ROI in terms of increased engagement, loyalty or satisfaction. With a majority of digital products/services relying on UGC, it becomes essential to gain a deeper understanding of the content users write. This content consists of, e.g., tweets, reviews, forum posts, and user feedback. What insights are hidden in these texts in order to better understand users?
During this workshop, you will experiment with topic modeling using Latent Dirichlet Allocation, and other forms of natural language processing (NLP) to gain a deeper understanding of UGC. Latent Dirichlet Allocation (LDA) is a generative probabilistic model which relies on the clustering of similar terms and calculates the statistical probability that these terms belong together. For example, looking at a large collection of tweets, the co-occurrence of the terms in “paris,” “attacks,” “police,” “france,” “isis,” “shooting” and “bataclan” could be interpreted as a topic about the November 2015 Paris attacks.
This will be a hands-on workshop where you will learn basic NLP with the software R and various R packages. No prior experience with coding is required though enthusiasm and perseverance are a must. We will provide the majority of the datasets (e.g., tweets and reviews), but you can also bring your own data such as your own WhatsApp chat.
After this workshop you are able to:
1. Explain the practical benefits and pitfalls of topic modeling;
2. Pre-process textual data;
3. Determine the optimal number of topics in a large corpus;
4. Apply topic modeling and social network analysis to various cases;
5. Visualize the results of topic modeling;
6. Interpret and explain the results
About the speaker(s)
Dr. Aletta Smits is head of research at “Human Experience & Media Design”, a research group at Utrecht Applied University in The Netherlands. She pairs her research with being a passionate teacher on UX Design, Behaviorial Influence, and Data learning. Together with Erik Hekman, she has developed the master program Datadriven Design. With their students they develop datadriven concepts that are both practically applicable and rigorously rooted in academic research. Her own research focuses on the subtle patterns in behavioral data, generated by e.g. respondents on surveys, and designing interventions to coursecorrect when a loss of commitment is predicted.
Erik Hekman is a researcher at “Human Experience & Media Design,” a research group at HU University of Applied Sciences in Utrecht, The Netherlands. With backgrounds in IT and media technology, he bridges the domains of technology, media, and culture in order to make media and communication students digital savvy. Together with Aletta Smits, he has developed the master program Datadriven Design. With their students they develop datadriven concepts that are both practically applicable and rigorously rooted in academic research. Main focus of his research lies on how public values are shaped by technology and how they can be measured