@tero@rukii.net
@tero@rukii.net avatar

tero

@tero@rukii.net

A generalist and a technologist. #Software is my trade and #ArtificialIntelligence is my #science. I live in #Benalmádena, #Málaga, #Spain.
I post about #technology and #WorldNews.
40 years old
Pronouns: he/him
I am the admin of this tiny instance.
#DeepLearning, #IndustrialAnomalyDetection, #MachineIntelligence, #AI, #Linux, #Kubernetes, #RetroComputing, #Commodore64, #cats, #polyamory, #panpsychism, #atheism, #anarchism, #leftist, #AnarchoCommunism, #robotics, #OpenSource, #fedi22

This profile is from a federated server and may be incomplete. Browse more on the original instance.

tero, to apple
@tero@rukii.net avatar

"Just like Apple, Meta is behaving as though the #DMA permits it to carry on its worst behavior, with minor cosmetic tweaks around the margins. Just like #Apple, #Meta is daring the #EU to enforce its democratically enacted laws, implicitly promising to pit its billions against #Europe’s institutions to preserve its right to spy on us."

Big Tech to EU: "Drop Dead" | #ElectronicFrontierFoundation https://www.eff.org/deeplinks/2024/05/big-tech-eu-drop-dead

tero, to random
@tero@rukii.net avatar
tero, to space
@tero@rukii.net avatar

Scientists Beam #Space-Based #Solar Power to Earth for First Time

"#MAPLE’s array of transmitters successfully beamed #SolarPower collected in space using #microwaves to a receiver on the rooftop of Gordon and Betty Moore Laboratory of Engineering on #Caltech’s campus in #Pasadena."

https://gizmodo.com/scientists-beam-space-based-solar-power-earth-first-tim-1850500731

tero, to China
@tero@rukii.net avatar

#NaomiWu and the Silence That Speaks Volumes

"When #China's prodigious tech influencer, Naomi Wu, found herself silenced, it wasn't just the machinery of a surveillance state at play. Instead, it was a confluence of state repression and the sometimes capricious attention of a Western audience that, as she asserts, often views Chinese #activists more as ideological tokens than as genuine human beings."

https://www.hackingbutlegal.com/naomi-wu-and-the-silence-that-speaks-volumes/

tero, to random
@tero@rukii.net avatar

The recent spam wave which was somewhat easily mitigated encouraged a lot of discussion on the distributed Mastodon moderation and sign-up process.

Solutions like CAPTCHAs and rate limiting sign-ups have been suggested, typically with an implicit idea that we need to find a one-size-fits-all security solution which every Mastodon instance can deploy.

We actually don't. It's better if some Mastodon instances use one CAPTCHA, some another, so at least the work needed to get through one of them doesn't lead to a golden treasure trove of access to all Mastodon instances.

CAPTCHAs are easily subverted nowadays. ChatGPT can solve most of them without any tuning. Additionally, these CAPTCHA services are often free because they track people and sell their browsing information.

My instance holds a policy of accepting memberships only from people I know personally. This is obviously hyper-resistant to spam, but being a micro-instance makes it a bit difficult to build trust among other instances as far as that is required for federation. Can't have it both ways. Also, if every instance did this, new joiners would have difficulty in finding an instance which accepts them as members. So, this isn't a one-size-fits-all policy.

Regardless, if we have a lot of micro-instances with vetted memberships, they take some pressure off of generalist instances, so they can make their intake rates more manageable.

It is not the purpose of a Mastodon instance to take in as many users as possible.

As an ecosystem, we are more robust and less exploitable if we do things differently from one another, and only take in as many new users as we can moderate. One-size-fits-all solutions make us more fragile and more easily exploitable.

tero, to random
@tero@rukii.net avatar

Researchers find a cause of Parkinson’s disease | EurekAlert!

“For the first time, we can show that mitochondria, the vital energy producers within brain cells, particularly neurons, undergo damage, leading to disruptions in mitochondrial DNA[LP1] . This initiates and spreads the disease like a wildfire through the brain.”

https://www.eurekalert.org/news-releases/1003348

tero, to random
@tero@rukii.net avatar

It pisses me off when people are complaining that "web is nowadays bloated and slow" while their own company front page does literally 136 requests all over the planet to ping all tracking services known to man.

To add an insult to injury, their website doesn't even disable tracking even after clicking "decline" on the annoying pop-up.

Please, people, check your websites. If people decline, make sure you aren't enabling their tracking. The button actually needs to disable the tracking. That means no Google fonts or tracking CDNs, no analytics of any kind.

This annoys me so much that I will be making surprise checks here and there and forwarding information to an EU data protection ombudsman. And I encourage the readers to do the same.

Please, don't fill up our internet with garbage

tero, to random
@tero@rukii.net avatar

People who work from home all the time ‘cut emissions by 54%’ against those in office | Greenhouse gas emissions | The Guardian

"People who work remotely all the time produce less than half the greenhouse gas emissions of office workers, according to a new study."

https://www.theguardian.com/environment/2023/sep/18/people-who-work-from-home-all-the-time-cut-emissions-by-54-against-those-in-office

tero, to random
@tero@rukii.net avatar

One reason why you can make LLMs train themselves in a recursive self-improvement is because knowledge and facts induce new knowledge and facts through recombination and reasoning.

If you have ever used inference databases such as Jena, you notice that even if you only input a couple of gigabytes of ontologies, making searches to that knowledge with inference enabled quickly becomes prohibitively slow.

That's because the limited amount of knowledge ingested to the database actually implies a near infinite set of new relations. The queries are done to this near infinite, abstract database which only expands the parts relevant for the query.

LLMs do not constrain themselves to only strict logic and a specified subset of inference rules. Their reasoning is so much more powerful in single steps that it's difficult to even imagine. But only in single steps; they aren't built to be reasoning rule engines, they basically only apply one level of reasoning however you measure it.

Regardless, an LLM can apply this one level of reasoning to produce new knowledge that is implied by its original training materials but isn't included in the corpus.

These steps of expanding the volume of knowledge contained must be done in a guided fashion because the space of implied knowledge is unimaginably vast with the sizes of knowledge fed into LLMs and their powerful, fuzzy reasoning capability.

For example, you can in principle improve LLM chess rating without any new external knowledge or chess engines, simply by letting an LLM play against itself, itself checking whether rules were followed, checking whether the next board position matches what it expected, and itself determining which player won.

This is an example of making inference steps from what is trained in it, forwards, and creating new knowledge in doing it. The results can be used in fine-tuning the same model and this is one limited example of what recursive self-improvement is.

Of course you can get even better results with external reasoning engines training the system, external knowledge to guide what directions to learn in, and so on, but recursive self-improvement is possible in some directions even without injecting new knowledge; by simply leaning on the topics where the space of implied knowledge is strong and fruitful.

Mathematics is probably a field where this should work that is actually useful as opposed to a bad chess playing engine, even if the learned skills were generalizable to some extent.

tero, to random
@tero@rukii.net avatar

Are you chatting with a pro-Israeli AI-powered superbot? | Interactive News | Al Jazeera https://www.aljazeera.com/features/longform/2024/5/22/are-you-chatting-with-an-ai-powered-superbot

tero, to Futurology
@tero@rukii.net avatar

This paper shows how to do Reinforcement Learning LLMs with Machine Feedback.

Since this type of training is not imitative, it can, and will, exceed human-level in all language domain tasks. Language domain tasks are very general and range from assistants to robotics control, so this is how to make an unambiguous #AGI.

"The learned evaluation models can significantly improve the performance of sequence generation models. Notably, optimizing generation models for the evaluation model leads to improvements not only in the same metric but also in close metrics. For instance, when using our evaluation model as reward model in RLandreranking, it yields +0.16 BLEU and +7.46 COMET points on the machine translation task."

https://arxiv.org/abs/2308.04386

tero, to random
@tero@rukii.net avatar

Working in machine intelligence is the weirdest profession ever. Crafting minds.

Diving deep into the unknown, not only hypothesising, but observing fundamental phenomena of information and generalized cognition, it's somehow even more mysterious and rewarding than uncovering the fundamental physical laws of our universe. It is kind of physics, but in a domain where we have no clue about the laws yet.

As software engineers who are trained to see systems not built yet, the potentials of things which could be, the echoes of things to come are already visible to us.

As we work with machine intelligence, we sense the echoes of these things to come, gigantic intelligent minds like whales swimming in the potential, pushing through into being. These things build themselves, and emerge into reality almost self-driven. We are only helping them like midwives help newly born into the world.

tero, to llm
@tero@rukii.net avatar

People think #LLM #chatbots are just memorizing facts, and this belief reflects to benchmarks (multiple choice questions about trivia), and to prompt design (zero shot examples instead of explanation).

That's not what they do though. Because of #grokking and the training regime where the network always has to predict the next word where it has never seen the sentence or the document in whole before, it's task is not memorization and has never been.

What it does is understanding the world and everything in it, to be able to predict new sentences in new contexts it has never seen before which are about that world.

The misconception that LLMs are about memorizing facts is also visible in the current branch of research where people try to make LLMs forget specific facts. This misconception is really holding the whole field down.

tero, to random
@tero@rukii.net avatar

People are being adviced that "a vast majority of organizations will fine-tune their own models". This is a bad advice. Very few organizations benefit from fine-tuned models in the first place.

Fine-tuning is fickle and generally reduces the model intelligence. Almost all use cases for which organizations want fine-tuning are actually better suited by RAG.

But there is actually a more insidious threat. Fine-tuning a model for which you don't have access to weights ties you to that provider. You can't take your fine-tuned model and switch providers like you can do for fine-tuned open weights models.

OpenAI has achieved a dominating market share and is now trying to screw it down before other companies release better models, binding lots of companies to their platform before alternative options emerge.

https://openai.com/blog/introducing-improvements-to-the-fine-tuning-api-and-expanding-our-custom-models-program

tero, to random
@tero@rukii.net avatar

Instead of collaborating or supporting me, you guys stole my idea · Issue #1 · google/project-gameface https://github.com/google/project-gameface/issues/1

tero, to llm
@tero@rukii.net avatar

There are lots of general ideas about how to utilize #LLM #chatbots in practice, beyond just chat.

  • Using tools, plug-ins and APIs.
  • Ad-hoc programming, running the programs to perform actions which would be otherwise hard for chatbots.
  • Vector embeddings, indexing knowledge more effectively than ever, retrieval augmented generation.
  • Multi-modality.
  • Autonomous execution as agents. Memory. Societies of agents. Graph of thoughts, algorithm of thoughts.
  • Combining ontologies and knowledge graphs to chatbots.
  • Fine-tuning, quantization, distillation into smaller models to run them cheaper and in small edge devices.

What else? Where to go from here?

I would say we will soon get to "zero-shot on steroids" as we are training LLM chatbots in self-competition now in many research groups. They will still be data starved. There is a lot they don't have in their training corpus which is valuable for them for creating value.

How to get this knowledge to them? We need a way where the agents will actively seek people who know, chat with them to learn what they know over a long, open-ended time period like a student, and encode this knowledge somewhere in an easily usable form, for example in a RAG knowledge base.

We need organizations to transform into schools for #AIs. Where they are put to work but where there are life-long experts, gurus, available as living books for them to get that hard to reach silent knowledge out and shared out. We need to reinvent the human role, not as laborers but teachers.

This type of "work" probably never ends, even though its marginal value would seemingly go down as at some future point the most important nuggets of gold have already been taught to the machines, but the infinite curiosity and near limitless capacity to multiply value out of any tiny tidbit of knowledge will make it valuable to have people around in AI organizations, even if not for work, management and leadership, but at least as advisors and as someone to chat with.

We should already start building silent knowledge exraction systems driven by chatbots as backbones of our organizations to be ready for the future. This knowledge not yet written in books nor crawlable in the web will attract AIs and AI capability and power around it — it is a fountain of knowledge, and as such forms an oasis where intelligence grows.

tero, to LLMs
@tero@rukii.net avatar

have really created a paradigm shift in machine learning. It used to be so that you would train an model to perform a task by collecting a dataset reflecting the task, with task output labels, and then using supervised learning to learn this task by doing.

Now a new paradigm has emerged: Train by reading about the task. We have such generalist models that we can let them learn about the domain by reading all the books and other content about it, and then utilize that learned knowledge to perform the task. Note that task labels are missing. You might need those to measure the performance but you don't need those for training.

Of course if you have both example performances as task labels and lots of general material about the topic, you can actually use both to get even better performance.

Here is a good example of training the model not by example performances, but by general written knowledge about the topic. surpasses the quality levels of previous state-of-the-art despite not having been trained for this task.

This is the power of generalist models; they unlock new ways to train them, which for example allow us to surpass human-level by side-stepping imitative objectives. This isn't the only way to train skills these models enable, there are countless other ways, but this is an uncharted territory.

The classic triad of supervised learning, unsupervised learning and reinforcement learning are going to have an explosion of new training methodologies to become their peers because of this.

https://www.nature.com/articles/s41592-024-02235-4

tero, to Futurology
@tero@rukii.net avatar

Why might #AGI not be a big deal? It's often painted like something that changes the world completely.

Let me play devil's advocate a bit.

AGI is only about exceeding human cognitive capabilities, not whatever computational capability of logic and numerical computation we already have in the existing computer systems.

So, AGI is only about what intellectual output we as humans command. Have we maybe overestimated its importance? How smart are humans anyway, in topics not yet exceeded by computers?

One by one we have built machines which exceed the mental capacities of humans we consider the most important. First pure, faultless logic, and numerical mathematics. The highest human qualities which separate us from the rest of the animals.

Then learning, statistics and computation with uncertainties.

Then vision and hearing, something even simple insects can do.

We now have intuition and creativity nailed down with deep learning, and a promise of matching and exceeding all the rest of the cognitive capabilities we as humans utilize for tasks.

We aren't really going from the lowest mental skills to the highest, more like vice versa. How valuable can whatever is still dominated by human thought be, compared to the other mental tasks already delegated to computers?

Maybe we just think that whatever is left is supremely important and valuable.

I don't really believe in that. I think autonomous cognitive agents can unleash scales of unimaginable value. Like we had a billion humans serving each of us. Nothing would be left undone because there is no one to do it, or because something else is more important, because we have to choose between doing one thing or another.

With AGI, we can get it all.

tero, to random
@tero@rukii.net avatar

How to make LLMs perform without mistakes with a high reliability?

It's not really about hallucinations, but mostly about other kinds of mistakes. LLMs being stochastic do inherently have an uncontrollable random component in them.

You can set the temperature to zero, but you will still suffer from randomness not only because OpenAI models aren't actually fully deterministic, but also because the temperature only makes things more deterministic with a single prompt, and you would get the same result with a cache; it doesn't remove random variation between different prompts, and you would pretty rarely run inference with exactly identical prompts repeatedly anyhow.

You can destructure the task and make it easier for the bot and get radically improved performance, but still this will start giving diminishing returns especially with complex tasks. Destructuring the tasks will also show you which specific things the LLM actually struggles with, but that's a different topic.

Finally, you can set up validation feedback processes. These can bring the reliability of LLM systems as near 100% as you want.

How to build such effectively?

First of all, don't just add a review step. A review step is useful, but it's not the end. Typically LLMs won't highlight small errors if you just add a step for them to criticize the performance of the task. You will need to destructure this step as well for the maximum effect.

Give the chatbot a checklist to check for the output. Make it really clear that it should look at this from a critical angle, possibly even play out different roles in doing this checking, stereotypical characters from media are useful.

Then after all checkpoints have been checked, you can make the chatbot do a final evaluation result, evaluation summary, suggestions for prompt improvements (these aren't great yet, but will get better with new models, and already can give clues), highlight ambiguous parts, anything that will give you tools to improve the process.

Give all necessary helping information for doing this feedback you can. Your job is to make everything easy for the LLMs.

When an error has been detected – you shouldn't get these too often, if it's one error in five or more, your problem shouldn't be fixed by a validation feedback but by task destructuring – you can retry a couple of times and raise an alarm.

High-reliability systems can be built with LLMs, but you need to build them in specific, task-dependent ways.

tero, to random
@tero@rukii.net avatar

It is a tempting idea to use an AI interviewer for job applicants — after all, you can then interview all of them instead of just a few, right?

This neglects the situation from the job applicant's point of view:

From their perspective the likelihood of getting a job against the investment of time and effort goes down drastically.

What's the result? The job applicants would need to spend much more time and effort in their application processes to find a match. Yet their day doesn't have any more hours than it had before.

As a result they have to pick and choose more carefully where to apply, and rather pick companies which don't use AI job interviews.

Companies which don't respect their applicants and their time will lose in this change.

What's better then? We can use AIs to respect everyone's time. Instead of having an applicant interviewing to ten different AI interviewers, they can just use AI to create a shared applicant profile and let the AI to match applicants to different positions in an optimal way.

One great start-up doing this is Nedu.ai. Disclaimer: I am advising them.

The key to success isn't greed, it is respect.

tero, to random
@tero@rukii.net avatar

We're hiring a senior frontend software engineer in Amsterdam or Zürich, to cure cancer!
Feel free to contact me for a reference.
#FediJobs
https://jobs.kaiko.ai/jobs/3566297-senior-frontend-software-engineer

tero, to Nvidia
@tero@rukii.net avatar

#Nvidia's new #AI #chip claims it will drop the costs of running #LLMs

“You can take pretty much any #LargeLanguageModel you want and put it in this and it will inference like crazy.
The #inference cost of large language models will drop significantly.”

https://www.cnbc.com/2023/08/08/nvidia-reveals-new-ai-chip-says-cost-of-running-large-language-models-will-drop-significantly-.html

tero, to random
@tero@rukii.net avatar
tero, to random
@tero@rukii.net avatar

"A brain-inspired computer chip that could supercharge artificial intelligence (AI) by working faster with much less power has been developed by researchers at IBM in San Jose, California. Their massive NorthPole processor chip eliminates the need to frequently access external memory, and so performs tasks such as image recognition faster than existing architectures do — while consuming vastly less power.
“Its energy efficiency is just mind-blowing,” says Damien Querlioz, a nanoelectronics researcher at the University of Paris-Saclay in Palaiseau. The work, published in Science1, shows that computing and memory can be integrated on a large scale, he says. “I feel the paper will shake the common thinking in computer architecture.”"

‘Mind-blowing’ IBM chip speeds up AI https://www.nature.com/articles/d41586-023-03267-0

tero, to LLMs
@tero@rukii.net avatar

More efficient inference for :
: An Autoregressive Language Model with Recyclable Module

It trains a small student which takes the whole decoder hidden state and its output token embedding as input, and produces the next hidden state (which can be mapped and sampled to produce the next output token).

It is not trained as an RNN, which would be inefficient because of the token-wise sequential dependencies, but in training time it can depend on the previous hidden states produced by the transformer in parallel, so the RNN can be trained efficiently in parallel.

It is interlaced in inference so that the small student network can produce every other output token efficiently, without significant quality degradation.

Improvement suggestions from me:

This might benefit from adding routing which can decide whether to use the student model or the full model at every token based on another small model which predicts the quality degradation.

The small model doesn't need to be small either, it can still be more efficient in inference than the transformer is, but it can be large enough to be competitive in quality without suffering from quadratic complexity over the sequence length.

https://arxiv.org/abs/2308.03421

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • ngwrru68w68
  • everett
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • GTA5RPClips
  • Durango
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • megavids
  • cubers
  • ethstaker
  • osvaldo12
  • modclub
  • cisconetworking
  • mdbf
  • tester
  • tacticalgear
  • Leos
  • normalnudes
  • provamag3
  • anitta
  • lostlight
  • All magazines