Computer science team creates coding program to interpret Chinese social media texts
Kennesaw State Professor of computer science Dr. Dan Lo and his team of students created a program last semester to data mine Chinese social media sites in partnership with the U.S. Embassy in Beijing, China.
Lo said the program retrieves and deciphers posts on popular Chinese social media outlets like Weibo and WeChat. Lo’s team of student researchers — four undergraduates and two graduates — conducted extensive data mining on China’s Weibo platform and in turn, gained valuable real-world experience under his tutelage.
Weibo, meaning “micro blog” in Chinese, is an open social platform with over 445 million monthly active users. It is similar to both Facebook and Twitter, and likewise is an epicenter of Chinese news.
Lo said he is passionate about the project as it embodies elements of social networking, big data, machine learning and data science. Additionally, it required fluency in both Chinese and Python, a coding language. The team’s biggest challenge was processing Chinese articles correctly because of the complexity of the Chinese language.
“Chinese language is highly context-sensitive. Consecutive Chinese letters can be combined and interpreted in multiple ways.” Lo said. “For instance, our data mining programs needed to determine if ABC was meant to be read as A, B, C, AB, BC or ABC in the mind of the writer. The meaning in each instance varied vastly.”
Handling word segmentation, numbers, acronyms, synonyms and newly developed words in Chinese slang posed additional frustrations. Linguistic challenges and the sheer volume of the big data analyzed required painstaking efforts to codify the inputs.
“Social media mining in English has been performed for many years in Twitter, Facebook, Instagram and other networks,” Lo said. “While it is theoretically possible to do the same in Chinese by applying most of the English techniques, it is not that simple.”
Information sharing on social media is a valuable tool in forming public opinion. The results from the project enabled Lo’s team to show the information shared among Chinese users and the key statistics of that data.
“Dr. Lo is an excellent mentor because he allows his students to have a role in all stages of the research process,” graduate research assistant Charles Gardner said. “By working so closely with his students, Lo creates strong researchers as well as a great finished project.”
Gardner has been working as a graduate research assistant for Lo for over a year now.
“In English, we do not have word segmentation,” Gardner said. “However, in this project, we look at Chinese words within references and it ends up being an interesting system.”
Lo has already begun work on his next exciting project identifying fake news and dealing with misinformation. This new project has been commissioned by the U.S. Embassy in Poland.
The program was awarded to KSU by the The Diplomacy Lab, an alliance between the U.S. Department of State and U.S. colleges and universities, according to KSU News. Under the program, partner U.S. universities are invited to submit proposals for State Department projects in various humanitarian areas.