The world of academia is generally not known for being on the cutting edge of technology. However, as technology rapidly advances, historians and researchers can utilize both artificial intelligence (AI) and deep learning to make their jobs easier.
Artificial intelligence is the ability for a machine to imitate intelligent human behaviour. Deep learning, a subset of machine learning, is the intermediary between machine learning and neural networks. Deep learning provides a fast and relatively easy way to process massive amounts of data, much of which would be tedious and time consuming for a human to process; because of this, pattern recognition is one of deep learning’s greatest strengths.
Historians are finding that the practical applications of deep learning can help them work with large amounts of information much easier.
Predicting and pinpointing trends
Kelly Ryan, Associate Professor of History and Dean of the School of Social Sciences at Indiana University Southeast, mentioned that historians typically use databases, particularly those who do social history, in order to predict trends and speed up their research.
“I think most professional historians have such an intimate knowledge of history, we recognize when we see trends in the modern world that reflect situations in the past,” Ryan said. “For example, the recent recession was predicted by many historians who saw similarities with the 1920s.”
However, predicting and pinpointing trends within the historical record is what deep learning does best. Deep learning can provide historians with a faster, more efficient way to work with data.
For example, in 2016, a team of researchers from the University of Bristol, the University of Cardiff, and the website FindMyPast, led by Nello Cristianini, Professor of Artificial Intelligence at the University of Bristol, used AI to analyze roughly 14% of local British newspapers from 1800 to 1950, covering more than 35 million articles and 28.6 billion words.
This study detected trends in the historical record, including references to war, coronations, elections of new popes, and disease, as well as gender bias, geographical focus, technology, politics, and accurate dates for specific events.
In Codice Ratio
Donatella Firmani, a Research Fellow at Roma Tre in Italy, is part of a team developing a deep learning transcription program at the Vatican Secret Archives (VSA), one of the largest historical archives in the world.
This project, In Codice Ratio, supports the analysis and transcription of more than 600 archival collections of historical documents from the VSA… including correspondence of the popes, starting from the eighth century.
“The VSA contains a lot of correspondence and bureaucracy, rather than say literature books, so there are many less known ‘facts’ that are waiting to be extracted,” said Firmani, who has a master’s and PhD in computer science.
Handwritten documents are essential to historiography. However, historians generally have little to no advanced computer science knowledge. Therefore, it is important that historians, paleographers, and other researchers are able to comprehend the technology.
“Ultimately, we plan to provide paleographers with a knowledge base, that will require no computer science skills to use,” Firmani said.
In the United Kingdom, Colin Greenstreet, Founder and Co-Director of MarineLives, is heading up a Kaggle Research Competition. Greenstreet said that the competition is tentatively scheduled to begin March 2019, with a second competition likely that November.
The Kaggle Research Competition is a machine learning focused community owned by Google, for the algorithmic identification of marks, initials, and signatures. MarineLives, a collaborative transcription project, works with primary manuscripts from the English High Court of Admiralty from 1627 to 1677. This court dealt with marine matters, such as sea disputes with ship owners or sailors on ships. Approximately six million words and full-text transcriptions were assembled over the last six years.
“We don’t have a complete sample of the English or European population from the 17th century—we deal with 17th-century data,” Greenstreet said. “But we do have a lot of mariners and merchants and other related trades.”
The historian David Cressy had the idea that if someone could sign their name, it meant they were literate. Whereas, the use of a crude mark or initials, which was a sort of interim between a mark and a signature, was a sign that someone was less literate.
“If we can distinguish the degree of sophistication of the signature and of the mark and of the initial, then we can assemble additional knowledge,” Greenstreet said. “If we can work at a much greater scale, we can assemble some pretty powerful data.”
MarineLives is currently in the process of forming a charity called Chronoscopic Education, which will become the home for MarineLives and a number of other projects they run. Their aim is to create a collaborative research community.
“It depends upon us assembling a fully annotated data set,” Greenstreet said. “We’re aiming for about 10,000 images, which will be annotated in some shape or form, and we’re still debating that annotation.”
Greenstreet said the project is a bit like dragging a horse to water because historians typically have fewer computer science skills. Therefore, it is important to show historians the power of working at a larger scale in terms of the data.
Deep learning challenges
Being a relatively new technology, deep learning is not perfect. It requires an investment of both time and money, as well as the expertise to use it, not just for the researcher but for the programmer, too.
After all, artificial intelligence and deep learning are just tools. Tools that can be used to assist with research, but not an outright replacement for solid academic analysis, which still relies on human interpretation in order to understand the context of the data.
“More predictive work is always welcome, but such programs also have a lot of flaws because contexts are never exactly similar,” Kelly said. “But many historians embrace technology in order to present it to the public.”