Challenge: Compare the Performance of Two Different LLMs
Evaluate text generation by using multiple LLMs and determine the best performer.
We'll cover the following
Challenge
In this challenge, we’ll explore the capabilities of two LLMs, google/flan-t5-small
and bigscience/mt0-small
. The task is to use these models for a specific text-generation task and evaluate their performance using ROUGE metrics.
Task
Translate the German proverb “Anfangen ist leicht, beharren eine Kunst” into English using both LLMs with the Transformers pipeline
. Then, evaluate each model’s performance using ROUGE metrics and determine which one performs better.
Using the Transformers pipeline
Note: Google’s FLAN-T5-Small is a refined version of the T5 model, developed for a diverse range of tasks without the need for additional fine-tuning. Released in the “Scaling Instruction-Finetuned Language Models” research paper, this open-source, sequence-to-sequence large language model has been fine-tuned on multiple tasks, across multiple languages.
For
google/flan-t5-small
:
Get hands-on with 1200+ tech skills courses.