"AI is developing at a rapid pace and has enormous potential for innovation. We provide a carefully curated insight into current research.
With the rapid development of AI in the fields of application, it is often ignored that a lot is also happening at the research level. This is where the foundations for new applications are laid. Therefore, D:ECONOMY will provide a curated overview of the current publications at regular intervals.
Today it's about, among other things:
How fine tuning can overcome the safety barriers of large models.
How LLMs can become more efficient.
How the use of AI in the workplace specifically affects productivity.
How a specialized LLM in medicine helps physicians with differential analysis.
Fine-tuning can overcome the security barriers of the LLMs:
Both Metas Llama and the new GPT 3.5 APIs from Open AI open the way for many companies to fine-tune, i.e. fine-tuning the pre-trained LLMs to their own purposes.
This paper addresses the associated potential security risks, both for model providers and their corporate customers:
The authors note that while existing security tuning infrastructures can limit malicious behaviors from LLMs at the time of inference, they do not cover security risks when fine-tuning privileges are extended to end users.
This only requires a few, disadvantageously designed training examples: security guardrails of GPT-3.5 Turbo could be overcome by fine-tuning with just 10 such examples at a cost of less than $0.20 via Open AI's APIs, which means that the model can be used in almost all harmful ways, responds to instructions.
Importantly, research also shows that even without malicious intent, simple fine-tuning with benign and commonly used datasets can inadvertently worsen the security posture of LLMs, albeit to a lesser extent.
Key insight: Even if the initial safety tuning of a model is flawless, it may not necessarily continue to be so after fine-tuning.
Paper: "Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!" (Preprint, PDF on ArXiV)
Effect of AI on productivity:
This paper is from September of this year, but of course still relevant. The researchers use multidisciplinary task samples to investigate how the use of LLMs affects productivity.
Across 18 different tasks selected to represent realistic examples of working at an elite consulting firm, consultants who used ChatGPT-4 far outperformed those who did not use it. Consultants using AI completed, on average, 12.2 percent more tasks, 25.1 percent faster, and delivered 40 percent higher quality results than consultants without AI.
Surprising: The use of ChatGPT can limit the variety of ideas of the subjects, but at the same time increase the quality of the ideas. AI acts like a skill leveler: The consultants who performed worst in the evaluation at the beginning of the experiment recorded the largest jump in performance (43 percent) when they were allowed to use AI.
Paper: "Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality" (preprint, downloadable on SSRN)
Fine tuning:
Does a lot really help a lot? LLMs are traditionally tuned on large data sets. However, recent studies suggest that small, high-quality data sets may be sufficient for tracking general instructions.
This study investigates whether a small set of different fine-tuning samples can improve performance on both traditional NLP benchmarks and open model-based evaluation.
Result of the study: Subsets of 1k-6k instruction fine-tuning samples are sufficient to achieve good performance in both, firstly: traditional NLP (Natural Language Processing) benchmarks and secondly: model-based evaluation. The authors also show that mixing textbook and open QA fine-tuning datasets optimizes performance in both evaluation paradigms.
Paper: "LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms" (Preprint, PDF on ArXiv)
LLMs in Medicine:
How well can LLMs help physicians with differential diagnoses (DDx)? This paper tests an LLM developed for this purpose:
The LLM was assessed on 302 challenging, real-world medical cases. In the independent evaluation, the LLM higher top 10 accuracy (59.1 percent) compared to unsupported physicians (33.6 percent). Physicians supported by the LLM achieved significantly higher top 10 accuracy (51.7 percent) compared to physicians without LLM support (36.1 percent). The LLM also outperformed GPT-4 in a subset of 70 test cases.
Paper: "Towards Accurate Differential Diagnosis with Large Language Models" (Preprint, PDF on ArXiv)
Much faster LLMs thanks to very granular “Mixture of Experts”:
“Mixture of Experts” (MoE) is the name given to the approach of using several smaller, specialized models instead of one large model. The authors of this paper from ETH Zurich have taken this approach to the extreme. There is currently no efficient implementation that exploits the full acceleration potential of this approach.
However, the authors provide high-level CPU code that they claim achieves a 78x speedup over the base optimized feedforward implementation, as well as a PyTorch implementation that achieves a 40x speedup over the corresponding batched feedforward -Inference provides.
Reason for the speed: The “Fast Feedforward Networks” presented here are structured in such a way that they only require an exponentially small portion of their neurons for inference - in the case of UltraFastBERT presented in the paper, only 0.3 percent of the neurons.
The researchers publish training code, benchmarking set-up and model weights.
The paper therefore suggests a potential for massive increases in efficiency and associated cost reductions when using LLMs designed in this way.
Paper: "Exponentially Faster Language Modeling" (Preprint, PDF on ArXiV)" [1]
1. KI-Papers: Über Feintuning, Sicherheit und KI in der Medizin. Frankfurter Allgemeine Zeitung (online) Frankfurter Allgemeine Zeitung GmbH. Dec 12, 2023. Von Marcel Weiß
Komentarų nėra:
Rašyti komentarą