(Tuesday, 16th May 2023)
With the massive increase in availability of unstructured text datasets, thanks to new social structures like the internet and email, as well as digitization efforts by governments and tech companies, there is growing potential for natural language processing tools in social science. These trends are especially salient in law, politics, and social media, where human communications and commitments are composed of vast amounts of unstructured text data that we cannot possibly read on our own. But with the right tools and techniques, we can teach computers to read and analyze this text for us, opening up new insights and opportunities.
This workshop will review the latest techniques and tools for reading text documents as data, including unsupervised learning techniques for interpreting corpora, supervised learning for regression and classification, word/document embedding for identifying key dimensions of language, and discourse analytics for summarization and question answering. The empirical potential for these tools will be illustrated with applications from law and economics and political economy.