Google has disclosed information about an enhanced method deployed to enhance spam detection on its no-cost email platform, Gmail. In the recent Google Security blog entry, the technology behemoth emphasizes that this constitutes “one of the most substantial defense enhancements” for Gmail in recent times. According to the company, the latest model demonstrates superior text identification, resulting in a 38% improvement in spam detection.
The method by which spammers circumvented Google previously
In detecting harmful content such as phishing attacks, inappropriate comments, and scams, platforms like Gmail, YouTube, and Google Play depend on text classification models, according to the company. These kinds of texts pose challenges for machine learning models due to spammers employing adversarial text manipulations to elude these classifiers. For instance, attackers utilized homoglyphs, invisible characters, and keyword stuffing to circumvent Google’s security measures.
What is RETVec
In enhancing the rigor and efficiency of text classifiers, Google has introduced a novel, multilingual text vectorizer named RETVec (Resilient & Efficient Text Vectorizer). This innovation aids in improving the accuracy of spam filter models and substantially reduces computational costs. The company has also detailed the application of RETVec in safeguarding Gmail inboxes.
The enhancement that RETVec brings to Gmail’s spam filters
In the last year, Google has extensively employed RETVec, finding it highly effective for security and anti-abuse purposes. The company replaced Gmail’s former text vectorizer with RETVec, resulting in a 38% enhancement in the service’s spam detection rate and a 19.4% reduction in the false positive rate.
Furthermore, the utilization of RETVec led to an 83% reduction in the model’s power consumption. It operates seamlessly across all languages and “all UTF-8 characters” without requiring any text preprocessing. This feature makes it well-suited for on-device, web, and large-scale text classification deployments.
Google asserts that “models trained with RETVec demonstrate accelerated inference speed owing to its compact representation.” The company further emphasizes that these “more compact models lead to reduced computational costs and lower latency, crucial for large-scale applications and on-device models.” Models trained with RETVec can be converted to TFLite for mobile and edge devices. The open-source model is accessible on Github.