Add concurrent processing for `sv.InferenceSlicer` #361

capjamesg · 2023-09-08T11:46:08Z

Description

This PR adds concurrent processing with the concurrent.futures threading method (part of the Python standard library) to sv.InferenceSlicer processing.

This PR drastically reduces inference times with the slicer.

Without concurrent processing:

Detections w/o SAHI: 127
--- 1.460580825805664 seconds ---
Detections w/ SAHI: 146
--- 13.294042110443115 seconds ---

With concurrent processing slicer:

Detections w/o SAHI: 127
--- 1.4807708263397217 seconds ---
Detections w/ SAHI: 146
--- 2.305016040802002 seconds ---

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

This change has been tested using the following code to validate the number of predictions with and without concurrent processing is the same.

import supervision as sv
import numpy as np
import cv2
import time
import roboflow

roboflow.login()

rf = roboflow.Roboflow()
workspace = rf.workspace()
project = workspace.project("vehicle-count-in-drone-video")
version = project.version(6)
model = version.model

def callback(x: np.ndarray) -> sv.Detections:
    result = model.predict(x).json()
    return sv.Detections.from_roboflow(result, class_list=list(project.classes.keys()))

image = cv2.imread("./original.jpg")
image_width = image.shape[1]
image_height = image.shape[0]

print("starting test...")

start_time = time.time()

slicer = callback(image)

print("Detections w/o SAHI:", len(slicer.xyxy))
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()

slicer = sv.InferenceSlicer(callback=callback)
sliced_detections = slicer(image=image)

prediction_num = len(sliced_detections.xyxy)

# sv.plot_image(sliced_image)
print("Detections w/ SAHI:", prediction_num)

print("--- %s seconds ---" % (time.time() - start_time))

Any specific deployment considerations

N/A

Docs

N/A

Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>

capjamesg · 2023-09-08T12:35:49Z

Good call changing the default # of workers to 1. The use case for which this is built is web inference; on device inference is unlikely to benefit from the concurrent processing so it makes sense to have a 1 default.

Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>

hardikdava · 2023-10-03T16:52:22Z

Hey @capjamesg, I took a look at PR. I am not getting same results as old implementation if worker_thread is more than 1. Please follow the logs:

Image Shape: (1174, 1920, 3)
Testing Old Implementation:
Numbers of detections: 365
Taken time: 23.60136890411377
***********************************************
Testing New Implementation:
------------------ Number of Threads 1 -----------------------
Numbers of detections: 365
Taken time: 17.404013872146606
Result is same as SAHI v1
------------------ Number of Threads 2 -----------------------
Numbers of detections: 365
Taken time: 14.097673654556274
Result is not same as SAHI v1
------------------ Number of Threads 3 -----------------------
Numbers of detections: 376
Taken time: 14.455281257629395
Result is not same as SAHI v1

Testing code:

import supervision as sv
import numpy as np
import cv2
import time
from ultralytics import YOLO

model = YOLO(model="yolov8m.pt")


def callback(x: np.ndarray) -> sv.Detections:
    result = model.predict(x, verbose=False)[0]
    return sv.Detections.from_ultralytics(result)


image = cv2.imread("../data/bird.jpg")
print("Image Shape:", image.shape)

print("Testing Old Implementation:")
start_time = time.time()
slicer = sv.InferenceSlicerOld(callback=callback)
sliced_detections_v1 = slicer(image=image.copy())

print("Numbers of detections:", len(sliced_detections_v1))
print("Taken time:", (time.time() - start_time))

print("***********************************************")

print("Testing New Implementation:")

for n in range(1, 4):
    print(f"------------------ Number of Threads {n} -----------------------")
    start_time = time.time()
    slicer = sv.InferenceSlicer(callback=callback, thread_workers=n)
    sliced_detections_v2 = slicer(image=image.copy())
    print("Numbers of detections:", len(sliced_detections_v2))
    print("Taken time:", (time.time() - start_time))
    if sliced_detections_v1 == sliced_detections_v2:
        print("Result is same as SAHI v1")
    else:
        print("Result is not same as SAHI v1")

P.S.: Copy and Rename sv.InferenceSlicer to sv.InferenceSlicerOld

capjamesg · 2023-10-04T10:41:48Z

@hardikdava Can you try with 8 threads?

SkalskiP · 2023-10-04T11:57:04Z

@hardikdava, shouldn't we use batch inference rather than multithreading?

capjamesg · 2023-10-04T11:58:23Z

@SkalskiP Can you say more?

onuralpszr · 2023-10-04T12:00:08Z

@hardikdava Can you try with 8 threads?

@hardikdava based on this code use "multiple image"

capjamesg · 2023-10-04T12:00:36Z

@hardikdava Is this the right way to compare the detections?

    if sliced_detections_v1 == sliced_detections_v2:
        print("Result is same as SAHI v1")
    else:
        print("Result is not same as SAHI v1")

Are two inferences guaranteed to be identical in terms of confidence, etc?

hardikdava · 2023-10-04T12:06:44Z

@capjamesg We are not changing any parameters. So technically the results should be same. The result is same for thread=1. Numbers of detections are also different for higher threads.

capjamesg · 2023-10-04T12:13:59Z

The number of detections should be different, right? SAHI leads to more predictions in aggregate.

I slightly modified my test:

import supervision as sv
import numpy as np
import cv2
import time
import roboflow

roboflow.login()

rf = roboflow.Roboflow()
workspace = rf.workspace()
project = workspace.project("vehicle-count-in-drone-video")
version = project.version(6)
model = version.model

def callback(x: np.ndarray) -> sv.Detections:
    result = model.predict(x).json()
    return sv.Detections.from_roboflow(result, class_list=list(project.classes.keys()))

image = cv2.imread("./example.jpg")
image_width = image.shape[1]
image_height = image.shape[0]

print("starting test...")

start_time = time.time()

slicer = callback(image)

print("Detections w/o SAHI:", len(slicer.xyxy))
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()

slicer = sv.InferenceSlicer(callback=callback, thread_workers=8)
sliced_detections = slicer(image=image)

prediction_num = len(sliced_detections.xyxy)

# sv.plot_image(sliced_image)
print("Detections w/ SAHI:", prediction_num)

print("--- %s seconds ---" % (time.time() - start_time))

Here were the results:

You are already logged into Roboflow. To make a different login, run roboflow.login(force=True).
loading Roboflow workspace...
loading Roboflow project...
starting test...
Detections w/o SAHI: 126
--- 1.898620367050171 seconds ---
Detections w/ SAHI: 145
--- 3.1943469047546387 seconds ---

capjamesg · 2023-10-04T12:20:11Z

Disregard my last message 🙃

I was shifting contexts from something else and misinterpreted that the number of predictions should be the same.

I have this updated code:

import supervision as sv
import numpy as np
import cv2
import time
import roboflow

roboflow.login()

rf = roboflow.Roboflow()
workspace = rf.workspace()
project = workspace.project("vehicle-count-in-drone-video")
version = project.version(6)
model = version.model

def callback(x: np.ndarray) -> sv.Detections:
    result = model.predict(x).json()
    return sv.Detections.from_roboflow(result, class_list=list(project.classes.keys()))

image = cv2.imread("./example.jpg")
image_width = image.shape[1]
image_height = image.shape[0]

print("starting test...")

start_time = time.time()

slicer = sv.InferenceSlicer(callback=callback, thread_workers=1)
sliced_detections = slicer(image=image)

print("Detections w/o SAHI:", len(sliced_detections.xyxy))
print("--- %s seconds ---" % (time.time() - start_time))

start_time = time.time()

slicer = sv.InferenceSlicer(callback=callback, thread_workers=8)
sliced_detections = slicer(image=image)

prediction_num = len(sliced_detections.xyxy)

# sv.plot_image(sliced_image)
print("Detections w/ SAHI:", prediction_num)

print("--- %s seconds ---" % (time.time() - start_time))

Changing the number of threads from 1 to 8 results in the same number of predictions:

loading Roboflow workspace...
loading Roboflow project...
starting test...
Detections w/o SAHI: 145
--- 19.341336011886597 seconds ---
Detections w/ SAHI: 145
--- 2.5259499549865723 seconds ---

With that said this number is higher than the current SAHI implementation, even with the number of workers set to 1, which is not expected.

capjamesg · 2023-10-04T12:22:29Z

One potential reason for having higher numbers of predictions is that the concurrent predictions are not ordered, unlike when the predictions are processed linearly as they are now. Do we need a re-arranging post-processing step?

SkalskiP · 2023-10-04T12:22:30Z

@hardikdava, any chance you could review it once again?

SkalskiP · 2023-10-04T12:25:47Z

@capjamesg I think there is a small chance that this is the reason. After inference, all detections go into one bag anyway.

hardikdava · 2023-10-04T12:37:26Z

@capjamesg technically when you optimize the algorithm in terms of the speed, the results should not be differ.

capjamesg · 2023-10-04T12:39:01Z

@hardikdava Mind taking a look at the code to see what could be the problem?

SkalskiP · 2023-10-04T12:40:03Z

@capjamesg, have you made any visualizations? I would love to look at a google colab where we compare performance and results.

hardikdava · 2023-10-04T12:40:24Z

@capjamesg , just for clear understanding, we can not directly compare results between with or without SAHI. we can compare the results by SAHI with different numbers of threads.

capjamesg · 2023-10-04T12:43:56Z

One thread results:

Multi-thread results:

add concurrent processing for inference slicer

ea0ec29

capjamesg self-assigned this Sep 8, 2023

fix(pre_commit): 🎨 auto format pre-commit hooks

bb42753

capjamesg requested a review from hardikdava September 8, 2023 11:47

onuralpszr and others added 3 commits September 8, 2023 15:03

fix(import): 📦 import only ThreadPoolExecutor,as_completed

8f58340

Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>

fix(pre_commit): 🎨 auto format pre-commit hooks

21e5d51

feat: ✨ control thread worker and set default 1

b9598f5

Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>

fix(pre-commit): formatting fix

c6b8fbe

Signed-off-by: Onuralp SEZER <thunderbirdtr@gmail.com>

Refactor args description in InferenceSlicer

1c1936e

SkalskiP self-requested a review October 4, 2023 16:15

SkalskiP approved these changes Oct 4, 2023

View reviewed changes

SkalskiP merged commit 6110804 into develop Oct 4, 2023
6 checks passed

SkalskiP deleted the add-concurrent-slicer branch January 2, 2024 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add concurrent processing for `sv.InferenceSlicer` #361

Add concurrent processing for `sv.InferenceSlicer` #361

Add concurrent processing for sv.InferenceSlicer #361

Add concurrent processing for sv.InferenceSlicer #361

Conversation

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Testing code:

Add concurrent processing for `sv.InferenceSlicer` #361

Add concurrent processing for `sv.InferenceSlicer` #361