With toy shopping season in full swing, families are seeking toys that are not only fun but safe for children to play with.
A Virginia Tech research project led by Alan Abrahams has found that text mining can help researchers make more effective use of the data in millions of consumer reviews posted online to identify toys with potential hazards.
Abrahams, an associate professor of business information technology in the Pamplin College of Business, said consumers rely heavily on the Internet for information about product safety and reliability, including information in consumer-generated reviews about toys.
Such reviews contain “a lot of useful text data that is often underutilized by toy manufacturers or regulatory agencies,” he said.
“But manually identifying and analyzing consumer reviews among millions of consumer postings that relate to product safety issues is a challenging and time-consuming task.”
His research team sought to investigate whether text mining, a process of extracting specific information from text, can help them identify and rank various safety issues in the vast volume of online reviews on toys.
The project is the brainchild of Johnathon Ehsani, a childhood injury prevention researcher at the National Institutes of Health and a co-author on the study.
“Plenty of dangerous toys arrive at stores every year,” Ehsani said, “and we were eager to see whether Virginia Tech’s prior research on automotive safety surveillance from online postings would apply to the toy industry.”
Abrahams led a Virginia Tech research team a few years ago that used text mining to unearth defects in the automotive industry.
Ehsani noted that a U.S. Consumer Product Safety Commission (CPSC) report estimates that 256,700 toy-related injuries were treated in the U.S in 2013. About three quarters of these injuries occurred to children younger than 15 years and a third to children younger than 5.
Abrahams and Ehsani are working on the project with Matt Winkler, of Fallston, Maryland, a senior in business information technology, and Rich Gruss, of Blacksburg, Virginia, a doctoral student in business information technology at Virginia Tech.
The researchers presented their research results in September to CPSC officials, who had invited team members to a meeting in Washington, D.C., after hearing of their unconventional application of CPSC recall and injury datasets.
A recent example of a toy recall is the “My Sweet Love/My Sweet Baby Cuddle Care Baby Doll.” Abrahams said Wal-Mart had to recall about 174,000 of the dolls due to a burn hazard. The CPSC reported that a circuit board in the doll’s chest could overheat and advised consumers to stop using the doll and return it to Wal-Mart for a refund.
The researchers developed a classification system for different types of safety and performance defects related to toys. They used text mining to create lists of “danger words” called “smoke words” related to potential hazards in children’s toys.
The lists were then used to evaluate or score a large sample of the more than one million product reviews on Amazon.com in the “Toys and Games” category in the 1999-2014 period. “We found that these smoke word lists provided a statistically significant method for identifying safety issues in children’s toys,” Abrahams said.
Text mining is becoming an increasingly popular method for analyzing big data and drawing conclusions, he said. It offers businesses an efficient way to gain valuable information for business process improvement.
Although text mining has been used in many studies, including financial market predictions and general consumer attitudes, few methods have been developed to apply it for extracting practical information about product defects or safety issues, Abrahams said.
“This is the first large-scale case study, to our knowledge, that confirms the effectiveness of text mining social media for safety surveillance in the toy industry.”
Gruss, who developed PamTag, a web-based system for annotating text postings that supports hundreds of users and that was used for testing the machine-learning algorithms deployed in this study, believes that Virginia Tech’s proprietary software infrastructure could position it as a global leader in quality surveillance research across multiple industries.
Sookhan Ho