Blog

Why Tensormobile does not offer voice biometrics

The major CPaaS portfolios typically list voice biometrics alongside SMS OTP and push in their authentication suites. Tensormobile does not offer voice biometrics, and will not at any point on the published roadmap. This post explains why.

The decision is grounded in peer-reviewed research showing the underlying security posture of voice biometrics is bypassable at rates that do not clear Tensormobile's publication bar. Shipping the feature would require a disclaimer Tensormobile is not willing to attach.

The research that settles it

In 2021, Kassis and Hengartner published the first practical attack on voice-spoofing countermeasures, which are the anti-spoofing layer that sits on top of an Automatic Speaker Verification (ASV) system to catch synthesised or converted speech. Their method, using a joint loss function to craft adversarial audio samples, achieves up to 93.57 percent black-box success against state-of-the-art ASV-plus-countermeasure deployments. The paper demonstrates targeted attacks over the telephony network, which is exactly the channel a voice-biometric verification product would use.

The result is not a niche lab finding. Three points matter for practical deployment.

First, the attack is against the combined system, with ASV plus the countermeasure designed to catch spoofing. The anti-spoofing layer designed to handle synthesised speech is what the paper directly falsifies. The attack does not require breaking the cryptography. It requires generating an audio signal that scores as the target speaker against a production-grade classifier that was trained to detect exactly such attacks.

Second, the attack is black-box. It does not require access to the model weights, the training data, or the decision threshold. The attacker needs samples of the target speaker and a computation budget well within commodity reach. Voice-cloning services commercialised in 2023 and 2024 (ElevenLabs and its competitors) have since reduced the sample-collection and inference costs substantially, which narrows the attacker budget without changing the underlying research conclusion.

Third, the attack works over telephony. The paper tests transmission through lossy telephony codecs and shows the attack survives the bit-rate and frequency-domain degradation that voice-biometric vendors sometimes point to as a robustness feature. The claim that telephony audio is too degraded for deepfakes to survive does not match the measured data.

The banking-sector research on speech-synthesis countermeasures converges on the same conclusion from the defensive side: there is no universal solution to voice-biometric spoofing, and countermeasures require combining cepstral analysis, phase and tone anomaly detection, emotional-state analysis, and synthesis-resistant dynamic password design. Responsible vendors are still actively researching countermeasures. The research is not finished.

Earlier work on voice-spoofing countermeasures using bidirectional-LSTM and ELTP+LFCC feature fusion shows that some countermeasures work in the ASVspoof 2019 logical-access benchmark environment. The Kassis and Hengartner follow-up shows that the same countermeasure class, deployed in production, is bypassable. Both findings are defensible on their own terms. They compose to a clear picture: voice biometrics works in lab settings, struggles in adversarial production settings, and cannot currently be trusted for high-stakes authentication.

What "93.57 percent bypassable" actually means

The number is specific, and specificity matters. It is the proportion of adversarial samples, generated under the paper's joint-loss method, that succeed against a black-box ASV-plus-countermeasure system targeted at a specific speaker. The failure mode is false-accept: the system believes the attacker is the legitimate speaker.

Translated into deployment terms: if a voice-biometric system authenticates a transaction, an attacker with samples of the legitimate user's voice and a commodity computation setup can make the system accept them as the user more than 90 percent of the time. The user's fraud exposure is not an occasional edge case. It is the baseline state of the world when a motivated attacker targets them.

Voice samples are not hard to obtain. A public social media video, a recorded customer-support call, a voicemail greeting, a podcast appearance: any of these provides enough material for the voice-cloning models that have become commodity in the last two years. The attacker's bar for executing the attack is therefore low in a world where most people have public audio of themselves.

A product team deploying a voice-biometric authenticator has to reconcile this reality with the authentication guarantees the product makes. Tensormobile chooses not to attempt that reconciliation.

Voice biometrics versus voice OTP

The rest of the Tensormobile voice offering is a delivery channel rather than an identity signal. Voice OTP (a spoken six-digit code delivered by telephone call) and IVR (interactive voice response for transactions) are both parts of the TensorConnect voice surface. Neither is affected by the decision to not ship voice biometrics.

The distinction is critical. Voice OTP sends information to the subscriber and asks them to enter it back through DTMF or a second channel. It is a transmission primitive rather than an identity claim. Voice biometrics, the thing this post is about, attempts to match a live audio stream against a stored voiceprint and assert the speaker is the enrolled subscriber. The two products share a word and little else.

Tensormobile ships voice OTP where it serves accessibility or specific user-experience needs, with the caveat that voice OTP is a delivery channel subject to the call-forwarding fraud described in the companion post on vishing. Tensormobile does not ship voice biometrics because voice biometrics is the claim that cannot be defended against the attack class in the cited research.

Why the research forces this position

A licensed operator publishing a voice-biometric product page that claims carrier-grade authentication has to reconcile that page with the 93.57 percent black-box attack result. The claim becomes unsustainable. A vendor that does publish the page is either unaware of the research, aware and betting that customers will not check the citation, or aware and calculating that the liability is manageable.

Tensormobile decided not to ship. The commercial cost is real. Voice biometrics is a product category customers ask for, and the ask is sometimes tied to a broader RFP for an authentication suite. The alternative is publishing a page that the internal technical-review gate would reject under the citation-integrity rules.

The positioning consequence is that Tensormobile has a concrete, auditable reason to cite when a buyer asks why voice biometrics is missing from the TensorAuth product surface. The reason is the Kassis and Hengartner 2021 result, the research is in the Tensormobile library, and the paper is available to share with buyers who ask.

What Tensormobile recommends instead

Customers asking for voice biometrics usually have a specific underlying problem: an authentication step in a voice-channel flow, a caller-identity verification for a call centre, a step-up factor for a high-value transaction. The underlying problem is real. Voice biometrics is one answer; better answers exist.

For caller-identity verification in a call centre, the responsible pattern is carrier-verified caller ID plus network-based identity resolution. The caller's MSISDN is verified at the signalling layer and the caller's account is matched against the MSISDN. Additional step-up, where required, uses a factor with a defensible security posture: silent auth through a second channel like SMS-back confirmation, biometric factors on the caller's device including device fingerprint and platform biometric attestation, or live-agent challenge questions against enrolled attributes. Each of these has known failure modes, but none of them is 93.57 percent bypassable by a commodity attacker with publicly-available voice samples.

For step-up authentication on voice-channel transactions, the pattern is the same: resolve the MSISDN at the signalling layer, require a device-bound factor, and for the highest-assurance transactions add a passkey or FIDO2 verification on a trusted device. Composite factor designs consistently outperform single-factor voice-biometric designs in the published literature, and the composition avoids the special-category-data compliance overhead that biometric template storage imposes under GDPR Article 9.

For customers who have already enrolled a voice print and expect to use it, the honest conversation is that the template is a data-protection obligation without a corresponding security benefit. Deleting the template reduces both compliance burden and customer risk. This recommendation cannot be made inside a product Tensormobile is shipping. It can be made clearly in a blog post that names the research and the decision.

What would change the decision

The decision is a permanence call rather than a timing call. Tensormobile will not ship voice biometrics on the current roadmap horizon. What would change the decision is a specific, measurable shift in the research picture: a peer-reviewed result that demonstrates a production-grade voice-biometric countermeasure with a false-accept rate below one percent against the attack class described in Kassis and Hengartner 2021, evaluated by independent replication, on commodity telephony audio. No such result currently exists in the literature. If one emerged, the decision would be re-evaluated. The threshold is high because the product claim the threshold supports is high.

Research-level improvements to voice-biometric robustness are happening. The gap between research-level and production-deployable is significant, and the publication bar for shipping the product now is the production-deployable side rather than the research side.

Vishing, call forwarding, and the MAP signals that reveal compromise

Skip the aggregator. Talk to the network.

Get Early Access

Take to an Engineer