#RNN - kbin.social

albertcardona, 6 months ago to Neuroscience

Henry Markram, of spike timing dependent plasticity (STDP) fame and infamous for the Human Brain Project (HBP), just got a US patent for "Constructing and operating an artificial recurrent neural network": https://patents.google.com/patent/US20230019839A1/en

How is that not something thousands of undergrads are doing with PyTorch every week?

The goal, says the patent text, is for <<methods and processes for constructing and operating a recurrent artificial neural network that acts as a “neurosynaptic computer”>> – which seems patentable, but not the overreach that is patenting the construction and operation of an RNN, which is, instead, ludicrous.

Seems likely that the legal office in Markram's research institution did an overreach and got away with it. Good luck enforcing this patent though: Markram did not invent RNNs.

#neuroscience #RNN #NeuralNetworks #ANN #RidiculousPatents

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

tero, 9 months ago to LLMs

More efficient inference for #LLMs:
#RecycleGPT: An Autoregressive Language Model with Recyclable Module

It trains a small student #RNN which takes the whole #Transformer decoder hidden state and its output token embedding as input, and produces the next hidden state (which can be mapped and sampled to produce the next output token).

It is not trained as an RNN, which would be inefficient because of the token-wise sequential dependencies, but in training time it can depend on the previous hidden states produced by the transformer in parallel, so the RNN can be trained efficiently in parallel.

It is interlaced in inference so that the small student network can produce every other output token efficiently, without significant quality degradation.

Improvement suggestions from me:

This might benefit from adding routing which can decide whether to use the student model or the full model at every token based on another small model which predicts the quality degradation.

The small model doesn't need to be small either, it can still be more efficient in inference than the transformer is, but it can be large enough to be competitive in quality without suffering from quadratic complexity over the sequence length.

https://arxiv.org/abs/2308.03421

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

ameo, 10 months ago to machinelearning

I've been making more progress on the sparse RNN training and visualization

Working on the blog post now. Lots of cool stuff went into this from custom activation functions, custom regularizers, the new machine learning library #tinygrad, #graphviz, #webgl, and more

Here, it learned a gated 3-state state machine coupled with other neurons that perform a different boolean operation depending on the current state

#rnn #machinelearning

Screen recording of the visualization I built for sparse RNNs pruned and converted into a computational graph. Shows many neurons connected together in a somewhat complicated graph. The nodes and edges are colored from blue (negative) to red (positive) to indicate their current value. As the video plays, the values of the inputs change and the colors of the neurons in the viz are updated to match. There is a logic analyzer-like pane on the bottom left that records a history of output values as a line chart for a selection of neurons and states. It updates live along with the rest of the visualization.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...