Changing the learning rule, not just the architecture, may be what unlocks continual learning.
There’s something called predictive coding which more closely resembles how the brain works. On smaller neural nets, predictive coding networks are better at continual learning: they don’t have as much catastrophic forgetting, each neuron can learn asynchronously and independently, and you can do inference and learning in the same phase. The need for hyperscaler server farms is less, because the learning is asynchronous.
Source: Rethinking the AGI Race, with Benjamin Goertzel (eCornell Keynote)