Denken
Menu
  • About Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials
  • Contact Me
Menu

Generative AI: LLMs: Getting Started 1.0

Posted on July 5, 2023July 5, 2023 by Aritra Sen

With the hype of ChatGPT/LLMs I thought about writing a blog post series on the LLMs. This series would be more on Finetuning LLMs and how we can leverage LLMs to perform NLP tasks. We will start with discussing different types of LLMs, then slowly we will move to fine-tuning of LLMs for specific downstream tasks and how we can do different tasks using python/pytorch (Tutorial series: Deep Learning with Pytorch (aritrasen.com) ).

Large Language Models (LLMs) are foundational machine learning models that use deep learning algorithms to process and understand natural language. These models are trained on massive amounts of text data to learn patterns and contextual relationships in the given language. LLMs can perform different types of language tasks, such as translation of languages, sentiments prediction, act as chatbot, text summarization and more. They have the capability to generate text according to the context and texts which are grammatically and syntactically correct. These models are very large to into a general computer and mainly models are stored in different model hubs like HuggingFace. In this blog post series, I assume you know the concepts of Transformers / Self Attention / and basic hands working of HuggingFace models.
In case you want a fresher on the workings of HuggingFace models, below are some of my previous posts that can help.

  • 1.0 – Getting started with Transformers for NLP (aritrasen.com)

Broadly we can classify LLMs into three categories:

  1. Encoder-only Models:
    These models are bidirectional in nature and mainly for downstream tasks like – Sentiment Analysis, Classification task, Named Entity recognition. Models are trained using mainly masked language modeling (MLM) which usually revolves around somehow masking parts a given sentence and asking the model with finding or reconstructing the initial sentence.
  2. Decoder-only Models:
    Pre-training decoder models usually revolves around predicting the next word (token) in the sentence. During prediction it only accesses the words(tokens) positioned before in the sentence that’s why these types of models are also called auto-regressive models. Text Generation is one of the main tasks you can perform using decoders only models.

  3. Encoder-Decoder Models(Seq2seq):
    Encoder-decoder models (sequence-to-sequence models) use both sections of the Transformer architecture. The attention layers of the encoder can access all the words in the given sentence, whereas the attention layers of the decoder can only access the words positioned before a given word in the input. Generally, encoder-decoder models are used for tasks like text-summarization, machine translation, question answering etc.
    The pretraining of these models can be done using the objectives of encoder or decoder models, however a complex process also can be involved like T5 is pretrained by replacing random spans of text with a single mask and the objective is then to predict the masked text.

    As we have gone through the basics of different types of LLMs in the next few posts we will go through how we can finetune or use these LLM models for different NLP tasks.
    Stay tuned for more stuff on LLM.
    Thanks for your time , do share in case you liked the content.
Category: Aritra Sen, Machine Learning, Python

Post navigation

← NLP: Toxic comment classification with TorchText & Pytorch Lightning
Generative AI: LLMs: Finetuning Approaches 1.1 →

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RSS Feeds

Enter your email address:

Delivered by FeedBurner

Pages

  • About Me
  • Contact Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials

Tag Cloud

Announcements Anrdoid BERT Bias Celebration Cricket CyanogenMod deep-learning Denken Experience Facebook Features Finetuning GCN GenerativeAI GNN Google HBOOT HBOOT downgrading HTC Wildfire huggingface India Launch Life LLM Lumia 520 MachineLearning mobile My Space nlp Orkut People Python pytorch pytorch-geometric Rooting Sachin Share Social Network tranformers transformers Tutorials Twitter weight-initialization Windows Phone

WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

Categories

Random Posts

  • Generative AI: LLMs: Finetuning Llama2 with QLoRA on custom dataset 1.5
  • Python Tutorials – 1.5 – Numpy
  • Orkut vs Facebook
  • Graph Neural Network – Node Classification with PyG – 2.1
  • Generative AI: LLMs: In Context Learning 1.2

Recent Comments

  • Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 – Denken on Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7
  • vikas on Domain Fuss
  • Kajal on Deep Learning with Pytorch -Text Generation – LSTMs – 3.3
  • Aritra Sen on Python Tutorials – 1.1 – Variables and Data Types
  • Aakash on Python Tutorials – 1.1 – Variables and Data Types

Visitors Count

AmazingCounters.com

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright

AritraSen’s site© This site has been protected from copyright by copyscape.Copying from this site is stricktly prohibited. Protected by Copyscape Original Content Validator
© 2025 Denken | Powered by Minimalist Blog WordPress Theme