Using a pretrained model to perform classification tasks.

Some pre-trained models available in `torchvision.models` :

AlexNet: The first CNN based model to win the ImageNet challenge.
ResNet (Residual Networks): Great for deep architectures.
VGG (Visual Geometry Group): Simple but computationally expensive.
EfficientNet: Balances accuracy and efficiency.
DenseNet: Uses feature reusability.

Steps for using pretrained models

1. Load a Pre-trained Model

import torch
import torchvision.models as models

model = models.resnet50(pretrained=True)
model.eval()

pretrained=True : the model will come with pre-learned weights from ImageNet.
model.eval() : disables dropout and batch normalization updates.
- this ensures that the model behaves consistently during inference (prediction).
- if the model was used for training then model.train().

model = models.resnet50(weights='DEFAULT')
It is the syntax for newer PyTorch versions.

2. Preprocess the Input Image

from torchvision import transforms
from PIL import Image

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean = [0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

PIL.Image : used to open and convert the images in the format that PyTorch can process.
Resize(256) : resize the image to 256 pixels, maintaining the aspect ratio.
CenterCrop(224) : Crop the image to 224×224 pixels about the center.
ToTensor() : converts the image from PIL format to Pytorch Tensor data type.
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- Normalizes the image with ImageNet mean and std, making sure values are in a similar range as the training images.
- It is common to use ImageNet mean and std.

image_path = "example.jpg"
image = Image.open(image_path).convert("RGB")
input_tensor = transform(image).unsqueeze(0)

.unsqueeze(0) : Adds an extra dimension (batch size), since the model expects (batch_size, C, H, W).

3. Forward pass for prediction

with torch.no_grad():
    output = model(input_tensor) # perform forward pass

probabilities = torch.nn.functional.softmax(output[0], dim=0)
top_class = torch.argmax(probabilities).item()

with torch.no_grad()
- there is no need for storing gradients in inference.
- thus saves memory and improves speed.
output = model(input_tensor)
- performs forward pass.
- outputs raw scores (logits) for each of the 1000 ImageNet classes.
.softmax(output[0], dim=0)
- shape of output = (1, 1000)
- output[0] : selects the first and the only row.
  - Now the shape of output tensor = (1000,).
  - Thus, only one axis remains dimension = 0
- dim=0 : tells to apply softmax across all 1000 values
  - the sum will be 1.
torch.argmax(probabilities).item()
- Finds the index of the class with the highest probability.
- .item() extracts the number from the PyTorch tensor.`

4. Decode Probabilities

import requests

url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
response = requests.get(url)
labels = response.text.splitlines()

class_name = labels[top_class]
print(f"Predicted class: {class_name}")

labels are fetched from the url
predicted class name is found out

Colab Notebook of the above code can be accessed here

Using a pretrained model to perform classification tasks.

Some pre-trained models available in torchvision.models :

Steps for using pretrained models

1. Load a Pre-trained Model

2. Preprocess the Input Image

3. Forward pass for prediction

4. Decode Probabilities

Some pre-trained models available in `torchvision.models` :