[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVDLA stuck during inference #17

Closed
suryacharanp opened this issue Mar 7, 2022 · 5 comments
Closed

NVDLA stuck during inference #17

suryacharanp opened this issue Mar 7, 2022 · 5 comments

Comments

@suryacharanp
Copy link
suryacharanp commented Mar 7, 2022

Hi @LeiWang1999

I am running the same implementation on ZCU104 with your code files, I insmoded the .ko file generated for MPSoC. I verified the device in "/dev/drm" and interrupt in "/proc/interrupts". They are perfect.

The problem is that it is getting stuck in the middle of model execution. I am attaching the debug log of KMD

submitting tasks...
[ 319.138013] Enter: dla_initiate_processors
[ 319.146859] Enter: dla_submit_operation
[ 319.150684] Prepare Convolution operation index 0 ROI 0 dep_count 1
[ 319.156939] Enter: dla_prepare_operation
[ 319.160857] processor:Convolution group:0, rdma_group:0 available
[ 319.166939] Enter: dla_read_config
[ 319.170342] Exit: dla_read_config
[ 319.173649] Exit: dla_prepare_operation status=0
[ 319.178259] Enter: dla_program_operation
[ 319.182173] Program Convolution operation index 0 ROI 0 Group[0]
[ 319.188205] no desc get due to index==-1
[ 319.192122] no desc get due to index==-1
[ 319.196035] no desc get due to index==-1
[ 319.199948] no desc get due to index==-1
[ 319.203864] no desc get due to index==-1
[ 319.207778] Enter: dla_op_programmed
[ 319.211348] Update dependency operation index 3 ROI 0 DEP_COUNT=3
[ 319.217431] Update dependency operation index 1 ROI 0 DEP_COUNT=1
[ 319.223516] enable SDP in dla_update_dependency as depdency are resolved
[ 319.230207] Enter: dla_enable_operation
[ 319.234036] exit dla_enable_operation without actual enable due to processor hasn't been programmed
[ 319.243071] Exit: dla_enable_operation status=0
[ 319.247594] Exit: dla_op_programmed
[ 319.251074] Exit: dla_program_operation status=0
[ 319.255684] Exit: dla_submit_operation
[ 319.259424] Enter: dla_dequeue_operation
[ 319.263341] Dequeue op from Convolution processor, index=3 ROI=0
[ 319.269337] Enter: dla_submit_operation
[ 319.273166] Prepare Convolution operation index 3 ROI 0 dep_count 2
[ 319.279423] Enter: dla_prepare_operation
[ 319.283340] processor:Convolution group:1, rdma_group:0 available
[ 319.289422] Enter: dla_read_config
[ 319.292824] Exit: dla_read_config
[ 319.296132] Exit: dla_prepare_operation status=0
[ 319.300742] Enter: dla_program_operation
[ 319.304656] Program Convolution operation index 3 ROI 0 Group[1]
[ 319.310685] no desc get due to index==-1
[ 319.314601] no desc get due to index==-1
[ 319.318519] no desc get due to index==-1
[ 319.322432] no desc get due to index==-1
[ 319.326347] no desc get due to index==-1
[ 319.330261] Enter: dla_op_programmed
[ 319.333831] Update dependency operation index 6 ROI 0 DEP_COUNT=3
[ 319.339914] Update dependency operation index 4 ROI 0 DEP_COUNT=2
[ 319.346000] Exit: dla_op_programmed
[ 319.349474] Exit: dla_program_operation status=0
[ 319.354080] Exit: dla_submit_operation
[ 319.357820] Exit: dla_dequeue_operation
[ 319.361649] Enter: dla_submit_operation
[ 319.365472] Prepare SDP operation index 1 ROI 0 dep_count 0
[ 319.371033] Enter: dla_prepare_operation
[ 319.374951] processor:SDP group:0, rdma_group:0 available
[ 319.380338] Enter: dla_read_config
[ 319.383738] Exit: dla_read_config
[ 319.387048] Exit: dla_prepare_operation status=0
[ 319.391655] Enter: dla_program_operation
[ 319.395571] Program SDP operation index 1 ROI 0 Group[0]
[ 319.400888] no desc get due to index==-1
[ 319.404806] no desc get due to index==-1
[ 319.408722] no desc get due to index==-1
[ 319.412635] no desc get due to index==-1
[ 319.416549] Enter: dla_op_programmed
[ 319.420119] Update dependency operation index 4 ROI 0 DEP_COUNT=1
[ 319.426202] enable SDP in dla_update_dependency as depdency are resolved
[ 319.432895] Enter: dla_enable_operation
[ 319.436722] exit dla_enable_operation without actual enable due to processor hasn't been programmed
[ 319.445759] Exit: dla_enable_operation status=0
[ 319.450280] Exit: dla_op_programmed
[ 319.453762] Exit: dla_program_operation status=0
[ 319.458369] Enter: dla_enable_operation
[ 319.462199] Enable SDP operation index 1 ROI 0
[ 319.466634] Enter: dla_op_enabled
[ 319.469942] Update dependency operation index 0 ROI 0 DEP_COUNT=1
[ 319.476025] enable Convolution in dla_update_dependency as depdency are resolved
[ 319.483412] Enter: dla_enable_operation
[ 319.487240] Enable Convolution operation index 0 ROI 0
[ 319.492376] Enter: dla_op_enabled
[ 319.495685] Exit: dla_op_enabled
[ 319.498906] Exit: dla_enable_operation status=0
[ 319.503427] Exit: dla_op_enabled
[ 319.506649] Exit: dla_enable_operation status=0
[ 319.511170] Exit: dla_submit_operation
[ 319.514912] Enter: dla_dequeue_operation
[ 319.518826] Dequeue op from SDP processor, index=4 ROI=0
[ 319.524130] Enter: dla_submit_operation
[ 319.527958] Prepare SDP operation index 4 ROI 0 dep_count 0
[ 319.533522] Enter: dla_prepare_operation
[ 319.537437] processor:SDP group:1, rdma_group:1 available
[ 319.542827] Enter: dla_read_config
[ 319.546227] Exit: dla_read_config
[ 319.549531] Exit: dla_prepare_operation status=0
[ 319.554137] Enter: dla_program_operation
[ 319.558051] Program SDP operation index 4 ROI 0 Group[1]
[ 319.563370] no desc get due to index==-1
[ 319.567286] no desc get due to index==-1
[ 319.571207] no desc get due to index==-1
[ 319.575124] no desc get due to index==-1
[ 319.579040] Enter: dla_op_programmed
[ 319.582607] Update dependency operation index 7 ROI 0 DEP_COUNT=2
[ 319.588692] Exit: dla_op_programmed
[ 319.592172] Exit: dla_program_operation status=0
[ 319.596782] Enter: dla_enable_operation
[ 319.600609] Enable SDP operation index 4 ROI 0
[ 319.605046] Enter: dla_op_enabled
[ 319.608352] Update dependency operation index 3 ROI 0 DEP_COUNT=2
[ 319.614437] Exit: dla_op_enabled
[ 319.617656] Exit: dla_enable_operation status=0
[ 319.622179] Exit: dla_submit_operation
[ 319.625919] Exit: dla_dequeue_operation
[ 319.629754] Enter: dla_submit_operation
[ 319.633585] Prepare PDP operation index 2 ROI 0 dep_count 1
[ 319.639149] Enter: dla_prepare_operation
[ 319.643065] processor:PDP group:0, rdma_group:0 available
[ 319.648454] Enter: dla_read_config
[ 319.651854] Exit: dla_read_config
[ 319.655164] Exit: dla_prepare_operation status=0
[ 319.659771] Enter: dla_program_operation
[ 319.663688] Program PDP operation index 2 ROI 0 Group[0]
[ 319.668991] group id 0 rdma id 0
[ 319.672229] no desc get due to index==-1
[ 319.676142] no desc get due to index==-1
[ 319.680058] no desc get due to index==-1
[ 319.683971] no desc get due to index==-1
[ 319.687887] no desc get due to index==-1
[ 319.691800] Enter: dla_op_programmed
[ 319.695370] Update dependency operation index 5 ROI 0 DEP_COUNT=2
[ 319.701453] Exit: dla_op_programmed
[ 319.704935] Exit: dla_program_operation status=0
[ 319.709543] Exit: dla_submit_operation
[ 319.713285] Enter: dla_dequeue_operation
[ 319.717199] Dequeue op from PDP processor, index=5 ROI=0
[ 319.722503] Enter: dla_submit_operation
[ 319.726331] Prepare PDP operation index 5 ROI 0 dep_count 1
[ 319.731895] Enter: dla_prepare_operation
[ 319.735810] processor:PDP group:1, rdma_group:1 available
[ 319.741200] Enter: dla_read_config
[ 319.744600] Exit: dla_read_config
[ 319.747910] Exit: dla_prepare_operation status=0
[ 319.752517] Enter: dla_program_operation
[ 319.756434] Program PDP operation index 5 ROI 0 Group[1]
[ 319.761736] group id 1 rdma id 1
[ 319.764968] no desc get due to index==-1
[ 319.768880] no desc get due to index==-1
[ 319.772793] no desc get due to index==-1
[ 319.776709] no desc get due to index==-1
[ 319.780623] no desc get due to index==-1
[ 319.784538] no desc get due to index==-1
[ 319.788452] Enter: dla_op_programmed
[ 319.792021] Exit: dla_op_programmed
[ 319.795501] Exit: dla_program_operation status=0
[ 319.800110] Exit: dla_submit_operation
[ 319.803851] Exit: dla_dequeue_operation
[ 319.807680] Exit: dla_initiate_processors status=0

After this, petalinux is also not reponsing. I had to reboot it.

I am thinking the OS is waiting for the NVDLA interrupt and it is not gettining it. I am also attaching the result of cat /pro/interrupts here

root@dlampsoc:/usr/nvdla# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
3: 16164 2068 2317 4121 GICv2 30 Level arch_timer
6: 0 0 0 0 GICv2 67 Level zynqmp_ipi
7: 0 0 0 0 GICv2 175 Level arm-pmu
8: 0 0 0 0 GICv2 176 Level arm-pmu
9: 0 0 0 0 GICv2 177 Level arm-pmu
10: 0 0 0 0 GICv2 178 Level arm-pmu
12: 0 0 0 0 GICv2 156 Level zynqmp-dma
13: 0 0 0 0 GICv2 157 Level zynqmp-dma
14: 0 0 0 0 GICv2 158 Level zynqmp-dma
15: 0 0 0 0 GICv2 159 Level zynqmp-dma
16: 0 0 0 0 GICv2 160 Level zynqmp-dma
17: 0 0 0 0 GICv2 161 Level zynqmp-dma
18: 0 0 0 0 GICv2 162 Level zynqmp-dma
19: 0 0 0 0 GICv2 163 Level zynqmp-dma
21: 0 0 0 0 GICv2 109 Level zynqmp-dma
22: 0 0 0 0 GICv2 110 Level zynqmp-dma
23: 0 0 0 0 GICv2 111 Level zynqmp-dma
24: 0 0 0 0 GICv2 112 Level zynqmp-dma
25: 0 0 0 0 GICv2 113 Level zynqmp-dma
26: 0 0 0 0 GICv2 114 Level zynqmp-dma
27: 0 0 0 0 GICv2 115 Level zynqmp-dma
28: 0 0 0 0 GICv2 116 Level zynqmp-dma
30: 0 0 0 0 GICv2 95 Level eth0, eth0
32: 15 0 0 0 GICv2 50 Level cdns-i2c
33: 0 0 0 0 GICv2 42 Level ff960000.memory-controller
34: 0 0 0 0 GICv2 57 Level axi-pmon, axi-pmon
35: 0 0 0 0 GICv2 155 Level axi-pmon, axi-pmon
36: 45 0 0 0 GICv2 47 Level ff0f0000.spi
37: 0 0 0 0 GICv2 58 Level ffa60000.rtc
38: 0 0 0 0 GICv2 59 Level ffa60000.rtc
39: 0 0 0 0 GICv2 165 Level ahci-ceva[fd0c0000.ahci]
40: 1319 0 0 0 GICv2 81 Level mmc0
41: 146 0 0 0 GICv2 53 Level xuartps
44: 0 0 0 0 GICv2 84 Edge ff150000.watchdog
45: 0 0 0 0 GICv2 88 Level ams-irq
46: 0 0 0 0 GICv2 154 Level fd4c0000.dma
47: 0 0 0 0 GICv2 151 Level fd4a0000.zynqmp-display
48: 0 0 0 0 GICv2 121 Level a0000000.dla_small
49: 0 0 0 0 GICv2 97 Level xhci-hcd:usb1
IPI0: 1454 1082 1707 1483 Rescheduling interrupts
IPI1: 17 98 102 136 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 665 2481 2467 2177 Timer broadcast interrupts
IPI5: 0 0 0 0 IRQ work interrupts
IPI6: 0 0 0 0 CPU wake-up interrupts

What do you think the issue is?

@LeiWang1999
Copy link
Owner

How about the memory copy sanity in Vivado SDK? Did it work correctly?

@suryacharanp
Copy link
Author

Yes it worked, just now I solved the problem. It is happening because of my lenet loadable, I might have done some mistake in generating it. It is working for the other loadables. Anyway, thank you for responding @LeiWang1999

@AlanWalker159
Copy link

@suryacharanp
Hi, friend, I meet the same issue as I used the caffe loadable from LeiWang999. Did LeiWang999's loadable work on your platform?

@suryacharanp
Copy link
Author
suryacharanp commented Apr 2, 2022

No, I generated my own loadables. And be careful while doing so, depending on the configuration of the DLA you might need to tweak the compiler code. Follow this linkhttps://github.com/powderluv/nvdla-notes

@AlanWalker159
Copy link

No, I generated my own loadables. And be careful while doing so, depending on the configuration of the DLA you might need to tweak the compiler code. Follow this linkhttps://github.com/powderluv/nvdla-notes

Thank you so so much, dear friend. I am going to try again, and I won't give up, :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants