This episode is an interview with Paul Vierthaler on how he has used text mining and analysis with large corpora of pre-modern Chinese literature. Paul discusses how and why he learned how to code – while in the middle of dissertation writing – and how he integrated what he found into a broader quantitative-qualitative project. He then outlines his latest research, which uses sequence alignment and stylometry to analyze and guess the authors of a wide swath of anonymously and pseudonymously authored books written in late imperial China.
Relevant links:
– Paul’s website
– Leiden Centre for Digital Humanities
– Fiction and history: polarity and stylistic gradience in late imperial Chinese literature (article on cultural analytics written by Paul)
– Analyzing printing trends in late imperial China using large bibliometric datasets (another article written by Paul)
– Ten thousand rooms project
– China Biographical Database