As artificial intelligence (AI) forces a tempo shift in public conversations about everything from health care to movies, Jo Guldi wants to pause.
A trained historian who joined the Emory College Department of Quantitative Theory and Methods (QTM) last fall, Guldi is hardly opposed to AI’s potential to transform our world.
She is simply aware of the software’s limitations — image-detection AI can act as confused by a rotated object as a kitten watching a ceiling fan — and cautions the need for human scrutiny.
“We have a real problem with credibility if we use AI to produce facts,” Guldi says. “A big part of that is because machines can’t think or consider historical context. Smarter data science is grounded in what data actually is and what human experience actually is.”
Guldi previously applied such big-data methods to analyze transcripts of British parliamentary debates for two separate books explaining historical trends in property law and land rights.
Her most recent book, “The Dangerous Art of Text Mining,” was released in 2023 and opens with an overview of data misuse stemming from inflated claims about AI. Specifically, it focuses on the misled belief that AI could sift through huge volumes of historical texts and turn cultural touchpoints — or people — into variables.
This data misuse led to a flurry of retractions and fines, including an attorney in New York owing $5,000 when ChatGPT invented “precedents” he then cited in federal court filings.
In the second half of the book, Guldi lays out algorithms and other computational strategies for incorporating historical modeling methods and humanistic questions of bias and sources into big-data tools.
Guldi believes that AI works best when trained to “think” about the multiple dimensions of human experiences of historical change.
“Memory is distinct from the occurrence of an event,” she says.
Building a start-up lab
Guldi is using “The Dangerous Art of Text Mining” as a manual — including the series of exercises for interdisciplinary teams to practice together — to build her undergraduate coursework and a new QTM lab. In both, students will combine humanistic questions with machine learning to develop a deeper understanding and gain new insights into climate change governance.
“The exposure to the way social sciences and humanities can be the focal point of data science research was revolutionary to me,” says Hava Collins, a junior majoring in quantitative sciences with a concentration in English.
“It is exactly the niche I’m interested in, a lot of technical things like text mining and topic modeling, with a focus on what the human impact is and what it means for broader historical context and society,” adds Collins, who is serving as Guldi’s research assistant after taking the fall course.
Guldi’s efforts are part of Emory’s AI.Humanity Initiative, a university-wide charge to humanize the technology by injecting liberal arts, ethics and other disciplines into its development and use.
The initiative also expands academic coursework for students to learn about the technology, regardless of their major, says QTM department chair Cliff Carrubba.
“Dr. Guldi is a preeminent scholar answering foundational questions of social importance using cutting-edge technology, but with a clear eye on the strengths and limits of that technology,” Carrubba says.
“This approach ensures that Emory's next generation of scholars, practitioners and leaders will be positioned to thrive in a world rapidly being transformed by AI,” he adds.
Mining data related to climate change
In establishing her new lab, Guldi asked colleagues across Emory what collections and documents in their fields addressed successes and failures in governing climate change.
She took those insights into her classes — and describes the result as mimicking start-up businesses. Students work in teams to gather, digitize and analyze data from the likes of state and local building departments, Congress and EDGAR, the internal database of forms required by the U.S. Securities and Exchange Commission.
Students share the painstaking work of cleaning and normalizing the data and forming educated proposals for analysis. Their ultimate task is to create an ongoing index of keywords, and how they change over time, from the various datasets.
This semester, students recently read reports from civil engineers, looking for mentions of technologies that deal with different kinds of flooding. Later scholars could examine those terms against data about extreme weather to find trends of success or failure.
Even with machine learning and teams of students, the work is detailed and slow. Last fall, for instance, Collins’ teammates converted PDFs from the Congressional record into a database.
Her assignment was topic modeling, letting the algorithm spot the most prominent words so she could carefully read through individual documents for context. In some cases, she found herself scouring single pieces of dialogue, to see how discussions about climate change shifted over time.
“It was just the start,” says Collins, whose experience convinced her she wants to pursue law school and a career examining policies and guidelines for AI and other big-data technology.
“It’s intimidating for a humanities major with some imposter syndrome, I know, to step foot into a QTM classroom of people who are so math-focused and technologically advanced,” Collins says. “But there really is such a huge overlap, and critical thinking is most important to both.”
That intersection is of particular interest to Guldi and other interdisciplinary scholars in QTM. In addition to guiding students on the project, she is working with those colleagues to develop a suite of classes for non-QTM majors, so they can develop the same skills and have the opportunity to join the research.
“Emory is really the only North American program where an undergraduate can get this kind of robust training in text data science,” Guldi says. “Emory can be the lighthouse about how to keep data science in the liberal arts, and do it well.”