One transformer model replaced 100 language-specific models per customer
Before transformers, we had different models for different languages. Supporting 100 languages meant running 100 different models per customer, which was painful for model operations. When we switched to transformers like BART about three and a half years ago, we could use one polyglot model for all languages. You could train the bot in English and ask questions in Finnish, and it worked reasonably well. It also helped our clustering pipeline show customers all their questions across languages semantically grouped together. You could see Finnish, Spanish, German, and English examples all meaning the same thing without worrying about the language.
— How Ultimate evolved from hundreds of supervised models to UltimateGPT · The Edge