How to load only 10 class out of 100 class folders

imagnet dataset consists of 1000 class which is listed in subdirectories. we can load all the 1000 class at once with tf.keras.utils.load_dataset_from_directory() which load all the 1000 class into one tf.data.Datset API
now suppose instead of loading all 1000 class I need to load the first 100 only what should I do ?

Hello,

I’d be glad to help you with loading a specific subset of classes from the ImageNet dataset using TensorFlow’s tf.keras.utils.load_dataset_from_directory Modified by moderator function. Here’s a refined approach that incorporates filtering and efficiency:

  1. Import Necessary Libraries:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

Define Class Subset:

Create a list containing the names of the 100 classes you want to load.

Hi @M_Akrm ,

import tensorflow as tf

# Assuming 'directory' is the path to the ImageNet dataset directory
directory = '/path/to/imagenet'

# Get the list of subdirectories (classes)
subdirs = sorted(os.listdir(directory))

# Select the first 100 subdirectories or Even you can mention your desired class names as well from imageNet.
selected_subdirs = subdirs[:100] or ["class1", "class2", ..., "class100"]

# Load the first 100 classes into a tf.data.Dataset
tf.keras.preprocessing.image_dataset_from_directory(
    directory,
    labels='inferred',
    label_mode='int',
    class_names=selected_subdirs,
    color_mode='rgb',
    batch_size=32,
    image_size=(256, 256),
    shuffle=True,
    seed=None,
    validation_split=None,
    subset=None,
    interpolation='bilinear',
    follow_links=False,
    crop_to_aspect_ratio=False,
    pad_to_aspect_ratio=False,
    data_format=None,
    verbose=True
)

for more information you can go through the Tensorflow preprocessing/image_dataset_from_directory.

Thanks.

You can filter the class labels while using “load_dataset_from_directory” or create a list of the 10 desired classes to load!

i tried this method how ever imageDataGenerator doesn’t drop the last batch if its not completed. which give me error in my custom training

ValueError: The class_names passed did not match the names of the subdirectories of the target directory. Expected: [‘n01440764’, ‘n01443537’, ‘n01484850’, ‘n01491361’, ‘n01494475’, ‘n01496331’, ‘n01498041’, ‘n01514668’, ‘n01514859’, ‘n01518878’, ‘n01530575’, ‘n01531178’, …]
but received: [‘n01440764’, ‘n01443537’, ‘n01484850’, ‘n01491361’, ‘n01494475’, ‘n01496331’, ‘n01498041’, ‘n01514668’, ‘n01514859’, ‘n01518878’]

it looks like class_names should have all the 1000 folder

i tried the above code which i believe do the same but it gives a value error since the class_names argument should have the names of all 1000 class

You’re right, class_names in load_dataset_from_directory expects all class labels. To load only the first 100 classes, you can filter after creating the dataset:

# Load the full dataset
dataset = tf.keras.utils.image_dataset_from_directory(
    data_dir,  # Path to your image directory
    shuffle=False  # Keep order for easy filtering
)

# Get a list of the first 100 class names
first_100_classes = dataset.class_names[:100]

# Filter the dataset to only include those classes
filtered_dataset = dataset.filter(lambda x, y: y in first_100_classes)

This approach avoids the value error and lets you work with just the desired classes. Just like filtering hundreds of Jedo door handles images to focus on a specific design category, here we’re filtering the dataset to specific classes.