albertcardona, to Neuroscience
@albertcardona@mathstodon.xyz avatar

Henry Markram, of spike timing dependent plasticity (STDP) fame and infamous for the Human Brain Project (HBP), just got a US patent for "Constructing and operating an artificial recurrent neural network": https://patents.google.com/patent/US20230019839A1/en

How is that not something thousands of undergrads are doing with PyTorch every week?

The goal, says the patent text, is for <<methods and processes for constructing and operating a recurrent artificial neural network that acts as a “neurosynaptic computer”>> – which seems patentable, but not the overreach that is patenting the construction and operation of an RNN, which is, instead, ludicrous.

Seems likely that the legal office in Markram's research institution did an overreach and got away with it. Good luck enforcing this patent though: Markram did not invent RNNs.

#neuroscience #RNN #NeuralNetworks #ANN #RidiculousPatents

tero, to LLMs
@tero@rukii.net avatar

More efficient inference for #LLMs:
#RecycleGPT: An Autoregressive Language Model with Recyclable Module

It trains a small student #RNN which takes the whole #Transformer decoder hidden state and its output token embedding as input, and produces the next hidden state (which can be mapped and sampled to produce the next output token).

It is not trained as an RNN, which would be inefficient because of the token-wise sequential dependencies, but in training time it can depend on the previous hidden states produced by the transformer in parallel, so the RNN can be trained efficiently in parallel.

It is interlaced in inference so that the small student network can produce every other output token efficiently, without significant quality degradation.

Improvement suggestions from me:

This might benefit from adding routing which can decide whether to use the student model or the full model at every token based on another small model which predicts the quality degradation.

The small model doesn't need to be small either, it can still be more efficient in inference than the transformer is, but it can be large enough to be competitive in quality without suffering from quadratic complexity over the sequence length.

https://arxiv.org/abs/2308.03421

ameo, to machinelearning
@ameo@mastodon.ameo.dev avatar
  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • tacticalgear
  • khanakhh
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • everett
  • ngwrru68w68
  • Durango
  • megavids
  • InstantRegret
  • cubers
  • GTA5RPClips
  • cisconetworking
  • ethstaker
  • osvaldo12
  • modclub
  • normalnudes
  • provamag3
  • tester
  • anitta
  • Leos
  • lostlight
  • All magazines