Best way to resize images for object recognition

Several models in the TF-Hub require images to be of a specific size, say 128x128 pixel values.

I assume that this is to some extent a limitation because the nets are trained to a fixed size, and the weight tensors will depend on it.

Crop the image isn’t good for OR. And if it is square-resized, this altering the aspect ratio.

Is there a generally thought better way to do this?

For example, is padding an image with black pixels to get the right ratio, and then resize a good idea?

Or do you just resize with some method and feed the image to the NN ?

It’s not necessary that your image dimensions needs to be same as the dimension the model was trained on you could actually declare your own input shape, most pre trained models go as low as 32 pixels and this in general I think is a better approach than resizing your image.

However you also have to be careful about how small your image dimensions actually are as it’s possible that the your image volume might become too small for the model.

Hope it helps

1 Like

They explicitly state 128x128 inputs for example. I am talking about the ones for JS at least. Otherwise could you expand please? What do you mean by declaring your own input shape?

Hi, so I did forgot to mention that for declaring your own input shape you need to not include the fully connected/dense layers of the model, you could do so by putting the value of the include_top parameter for the model as False

So let’s say if I want to use the resnet 50 pre-trained model then

model = Sequential()
model.add(ResNet50(include_top=False,weights='imagenet',input_shape=([128,128,3])))

Then if you want you could further add you dense layers manually.

I find this strange because you’d need a different weight matrix…

Like your own dense layers will have their own set of weights that will now be trained, if you do add other layers

Then if you want you could further add you dense layers manually.

In practice MLEs don’t do this because of level of effort and I haven’t seen it in practice for folks to handle image size differences. It sounds like this requires retraining which is not ideal. I think the original posters original question is worth discussing, how to handle image size issues at inference time.

I’ve used fixed size resizing but preserving aspect ratio, this ends up making some images smaller which degrades performance. I’ve also divided large images into sections, doing something like GitHub - obss/sahi: Framework agnostic sliced/tiled inference + interactive ui + error analysis plots, this is expensive since you’re running inference a lot and then post processing the result, but is more accurate in some cases since you don’t change the image resolution.

1 Like

Same here, I resize keeping AR at the moment, padding with some suitable value (sometimes 0, sometimes other value.)

I do not think splitting the image is a good idea, specially because you may split in an object and end up reducing the detector’s confidence etc.

Does not seem to be a recommended approach, may depend in the use case, still, I’d be happy to discuss.