[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alveo u200 "Packet Length Mismatch" during DPDK pktgen and testpmd application execution #16

Open
attdone opened this issue Jan 9, 2024 · 22 comments

Comments

@attdone
Copy link
attdone commented Jan 9, 2024

Hi Team
I am working on OpenNIC design for au200, and I have diligently followed the steps provided in https://github.com/Xilinx/open-nic-dpdk.

However, I am encountering an issue while executing the pktgen application, specifically a "Packet Length Mismatch" error. This discrepancy is resulting in lower-than-expected Tx/Rx values and packet drops. Furthermore, the RX functionality appears to stop altogether, resulting in a throughput of only 9Gbps, significantly lower than the target of 100Gbps.

Here is the command I am using to run pktgen:

./pktgen-dpdk-pktgen-20.11.3/usr/local/bin/pktgen -a 08:00.0 -a 08:00.1 -d librte_net_qdma.so -l 4-10 -n 4 -a 03:00.0 -a 03:00.0 -- -m [6:7].0 -m [8:9].1

Error Message
"Timeout on request to dma internal csr register", "Packet length mismatch error" and "Detected Fatal length mismatch"

Error generated,
C2H_STAT_S_AXIS_C2H_ACCEPTED 0xa88 0x110cb 69835
C2H_STAT_S_AXIS_WRB_ACCEPTED 0xa8c 0x10ed0 69328
C2H_STAT_DESC_RSP_PKT_ACCEPTED 0xa90 0x10ed1 69329
C2H_STAT_AXIS_PKG_CMP 0xa94 0x10ed1 69329
C2H_STAT_DBG_DMA_ENG_0 0xb1c 0x48e00304 1222640388
C2H_STAT_DBG_DMA_ENG_1 0xb20 0xe7e40000 -404488192
C2H_STAT_DBG_DMA_ENG_2 0xb24 0x80000000 -2147483648
C2H_STAT_DBG_DMA_ENG_3 0xb28 0x80020813 -2147350509
C2H_STAT_DESC_RSP_DROP_ACCEPTED 0xb10 0x1c8e1 116961
C2H_STAT_DESC_RSP_ERR_ACCEPTED 0xb14 0 0
eqdma_hw_error_process detected Fatal Len mismatch error

I have raised queries in git under below links, no reply yet
Xilinx/open-nic-dpdk#2 (comment)

Please let me know how to resolve this issue.

@cneely-amd
Copy link
Collaborator

Hi @attdone,

Can you say what version of Vivado that you used to build the open-nic-shell, and which version of the QDMA IP was generated?

Best regards,
--Chris

@attdone
Copy link
Author
attdone commented Jan 11, 2024

Hi @cneely-amd,
Thanks for the reply.

Vivado : v2021.2 (64-bit)
QDMA : v4.0

Below are the cpu configurations of PowerEdge R520.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz
Stepping: 7
CPU MHz: 2400.000
CPU max MHz: 2400.0000
CPU min MHz: 1200.0000
BogoMIPS: 3799.77
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d

@cneely-amd
Copy link
Collaborator

Hi @attdone,

One first guess is that your machine might not have good performance because the processor architecture at first glance appears to be old / from over 10 years ago and appears to be a single processor containing only 6 physical cores.

The example machine configurations that @aneesullah and I were discussing on that open-nic-dpdk issue 2 above your comment at (Xilinx/open-nic-dpdk#2 (comment)) page contained, e.g., 16 or 32 cores on much more recent processor architectures.

The pktgen-dpdk command example within the open-nic-dpdk instructions includes a bunch of parameters, and some of these relate to the number of logical cores and mapping.

• sudo <pktgen_dpdk_path>/usr/local/bin/pktgen -l 4-10 -n 4 -a ${DEVICE_1_BDF} -a ${DEVICE_2_BDF} -d librte_net_qdma.so -- -m [6:7].0 -m [8:9].1

• Note: you'll need to adjust the lcores given to the -l and -n options and also the lcores referenced in the -m options to match your system capabilities.  

	○ I would guess that maybe with your machine capabilities you might use fewer lcores than the example above, but this would also lower performance.

Another suggestion, @aneesullah's message within that issue 2 post for open-nic-dpdk, described initial performance at 10 Gbps, she discovered that this improved to the full 100 Gbps after disabling NUMA / making sure NUMA nodes are not enabled in the BIOS, for one of her machines (I know though that you have a different architecture).

One more suggestion: check the width (number of lanes) that are being provided to the card, for example, run:

sudo lspci -d 10ee: -vv


Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x16

And check that the "Width" for the card is "x16".

Also, to get more info on your setup, what versions of: Linux, DPDK, pktgen-dpdk, and AMD-Xilinx QDMA drivers are you trying in your setup?

Best regards,
--Chris

@attdone
Copy link
Author
attdone commented Jan 12, 2024

Hi @cneely-amd,
Thanks for the reply.

I am using Rhel8 OS , DPDK v20.11.0, Pktgen-DPDK v21.03.0, DMA IP drivers commit id 7859957 (has QDMA-DPDK v2020.2.1)

I had performed
1 ) varying the lcores while executing pktgen application.

Pktgen-DPDK/usr/local/bin/pktgen -a 08:00.0 -a 08:00.1 -d librte_net_qdma.so -l 1-11 -n 6 -a 03:00.0 -a 03:00.0 -- -P -m [2:8].0 -m [4:10].1

Pktgen-DPDK/usr/local/bin/pktgen -a 08:00.0 -a 08:00.1 -d librte_net_qdma.so -l 2-11 -n 8 -a 03:00.0 -a 03:00.0 -- -P -m [3:9].0 -m [4:10].1

Sharing the CPU Layout,

dpdk-20.11/usertools/cpu_layout.py 
======================================================================
Core and Socket Information (as reported by '/sys/devices/system/cpu')
======================================================================
cores =  [0, 1, 2, 3, 4, 5]
sockets =  [0]
       Socket 0       
       --------       
Core 0 [0, 6]         
Core 1 [1, 7]         
Core 2 [2, 8]         
Core 3 [3, 9]         
Core 4 [4, 10]        
Core 5 [5, 11]        

2 ) Disable NUMA under grub.

default_hugepagesz=1G hugepagesz=1G hugepages=4 iommu=pt intel_iommu=on numa=off

Still there is no increment in throughput; even the packet length mismatch occurs.

3 ) The value of Width under PCIe is x8 instead of x16.

[admin@b ~]$ sudo lspci -s 08:00.0 -vvv 
08:00.0 Network controller: Xilinx Corporation Device 903f
	Subsystem: Xilinx Corporation Device 0007
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 49
	IOMMU group: 21
	Region 0: Memory at dfb80000 (64-bit, non-prefetchable) [size=256K]
	Region 2: Memory at df000000 (64-bit, non-prefetchable) [size=4M]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [60] MSI-X: Enable- Count=10 Masked-
		Vector table: BAR=0 offset=00030000
		PBA: BAR=0 offset=00034000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
		DevCtl:	CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM not supported
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x8 (downgraded)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR-
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- LTR- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [1c0 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: LaneErr at lane: 0 1 2 4 5 6 7
	Capabilities: [200 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Kernel driver in use: vfio-pci

@cneely-amd
Copy link
Collaborator
cneely-amd commented Jan 12, 2024

Hi @attdone ,

Also, I also want to check was your open-nic-shell bitfile meeting timing, too?

Hopefully something like the following, from within Vivado?
image

Best regards,
--Chris

@attdone
Copy link
Author
attdone commented Jan 16, 2024

Hi @cneely-amd,
Thanks for the response.
The timing information is available in below image.
image

@cneely-amd
Copy link
Collaborator

Hi @attdone ,

Thanks.

I also want to ask if you are using an unmodified version of the open-nic-shell (like git status and git diff report no changes)?

(The reason that I'm asking is because I haven't experienced Packet length mismatch with DPDK before, but I could imagine something like that possibly occurring if the TUSER_MTY signal was not correct for some reason for the packets leaving box 250 and entering the QDMA subsystem in terms of thinking about a possible hardware related explanation)

I have U250s and use ubuntu in my test setup, however, what you described of your test setup seems reasonable. Also why are you using a newer version of pktgen-dpdk than your DPDK? (in writing up the instructions I had intended for matching versions of DPDK and pktgen-dpdk but I don't know if changing would make a difference, trying to think about differences on the software side)

Best regards,
--Chris

@attdone
Copy link
Author
attdone commented Jan 17, 2024

Hi @cneely-amd,
Thanks for the reply.
I have used the open-nic-shell from git and compiled using below commands. I have not made any modifications in open-nic shell.

$ git clone https://github.com/Xilinx/open-nic-shell.git
$ cd open-nic-shell 
$ git tag
1.0
$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
$ source /tools/Xilinx/Vitis/2021.2/settings64.sh
$ cd scripts
$ vivado -mode tcl -source build.tcl -tclargs -board au200 -num_cmac_port 2 -num_phys_func 2

At first, I conducted tests using Pktgen v20.11.3. Subsequently, I came across two GitHub issues (Xilinx/open-nic-dpdk#2 and Xilinx/open-nic-dpdk#3) which highlighted the use of a higher version of the Pktgen application. Regardless of whether it was Pktgen v20.11.3 or the higher version (v21.03.0) used in the GitHub issues, I consistently encountered the 'Packet Length Mismatch' error.

I had enabled "RTE_LIBRTE_QDMA_DEBUG_DRIVER" in dpdk-20.11/config/rte_config.h file to debug on with Packet Length Mismatch error.

#define RTE_LIBRTE_QDMA_DEBUG_DRIVER 1 

@cneely-amd
Copy link
Collaborator

Hi @attdone,

I was attempting today to see if I could quickly create a similar test environment to see whether I could reproduce the issue you were having.

I installed RedHat v8.9 on an old computer, and I built dpdk and pktgen for the QDMA. However, I admit that I have very limited development experience with RedHat. I also realize that there are some small gaps in trying to directly translate over the steps for Ubuntu from the current open-nic-dpdk instructions, for using RedHat.

I'm at the point right now, when I attempt to load pktgen, it is complaining about not finding the librte_timer.so.21:

[cneely@localhost ~]$ sudo /home/cneely/pktgen-dpdk-pktgen-20.11.3/usr/local/bin/pktgen -a 01:00.0 -a 01:00.1 -d librte_net_qdma.so -l 1-7 -n 4 -a 00:01.0 -- -m [3:4].0 -m [5:6].1

/home/cneely/pktgen-dpdk-pktgen-20.11.3/usr/local/bin/pktgen: error while loading shared libraries: librte_timer.so.21: cannot open shared object file: No such file or directory

There seems to be some basic difference here because I didn't encounter this sort of issue with Ubuntu. I wanted to ask you if you encountered this same issue along the way, and if so, how did you get past it? I'm hoping that you might know this part already. Did you have to add anything to your /etc/ld.so.conf.d/ for loading additional libraries?

Best regards,
--Chris

@attdone
Copy link
Author
attdone commented Jan 18, 2024

Hi @cneely-amd,
Thanks for the update.
Please add -rpath=/usr/local/lib64/ next to -Wl in pkgconfig files /usr/local/lib64/pkgconfig/libdpdk-libs.pc and /usr/local/lib64/pkgconfig/libdpdk.pc.

The file contents can be..

$ sudo cat /usr/local/lib64/pkgconfig/libdpdk.pc 
.....
Libs.private: -Wl,-rpath=/usr/local/lib64/,--whole-archive ...
....

Execute ldconfig and recompile Pktgen application

sudo ldconfig
cd pktgen-dpdk-pktgen-20.11.3
export PKG_CONFIG_PATH=/usr/local/lib64/pkgconfig/
pkg-config --static --libs libdpdk
make clean
make RTE_SDK=../dpdk-20.11 RTE_TARGET=build

@cneely-amd
Copy link
Collaborator

Hi @attdone,

Thank you for the advice, it helped me tremendously.

I'm able to run pktgen on this old test machine with RedHat v8.9. The initial test seems to be working.

[cneely@localhost ~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Stepping: 3
CPU MHz: 4000.000
CPU max MHz: 4000.0000
CPU min MHz: 800.0000
BogoMIPS: 7183.45
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7

This is with PCIe width: x16

This is what I'm getting from in terms of send and receive stats:

image

[cneely@localhost ~]$ uname -a
Linux localhost.localdomain 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux

less /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-513.11.1.el8_9.x86_64 root=/dev/mapper/rhel-root ro crashkernel=auto resume=/dev/mapper/rhel-swap rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet default_hugepagesz=1G hugepagesz=1G hugepages=4 intel_iommu=on numa=off

@cneely-amd
Copy link
Collaborator
cneely-amd commented Jan 18, 2024

@attdone
Also, this is the example script that I run prior to running dpdk-devbind and pktgen:

#!/bin/bash

sudo setpci -s 01:00.0 COMMAND=0x02;
sudo setpci -s 01:00.1 COMMAND=0x02;

#setup the QDMA registers
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0x1000 w 0x1;
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0x2000 w 0x00010001;

#init CMAC0, with serdes loopback (third command)
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0x8014 w 0x1;
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0x800c w 0x1;
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0x8090 w 0x1;

#init CMAC1, with serdes loopback (third command)
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0xC014 w 0x1;
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0xC00c w 0x1;
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0xC090 w 0x1;

#read the rx_status register: expecting 0x03 on second readback if working
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0x8204;
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0x8204;
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0xC204;
sudo ~cneely/pcimem/pcimem /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/resource2 0xC204;

@attdone
Copy link
Author
attdone commented Jan 18, 2024

Hi @cneely-amd,
I had used similar commands to init QDMA, CMAC0 and CMAC1. Before binding I am inserting VFIO-PCI driver sudo modprobe vfio-pci iommu=pt intel_iommu=on.While executing the pktgen application, I could see the values in Parameter Mbits/s RX/TX in Pktgen Applciation rapidly changes to 0 after 2 seconds. Whether you are facing similar issue?

When I enable #define RTE_LIBRTE_QDMA_DEBUG_DRIVER 1, I could the "Packet Length Mismatch Error" under /var/log/messages.

Could you please enable RTE_LIBRTE_QDMA_DEBUG_DRIVER, to confirm whether you are receiving Packet Length Mismatch?

@cneely-amd
Copy link
Collaborator

Hi @attdone,

I'm about to pause for today, and I'll try testing the debug flag tomorrow.

One quick suggestion is that if you see the rapid change to 0 after a couple of seconds, try doing a reboot, and run the test again. (I don't know what causes that but I know that rebooting helps.)

I should also say that within pktgen, I'm specifically running:

range 0 size 64 64 1518 3
range 1 size 1500 64 1518 5
enable 0-1 range
start 0-1

@attdone
Copy link
Author
attdone commented Jan 18, 2024

Hi @cneely-amd,
Thanks for the immediate response.
We can check for tomorrow.

For an updation, I had rebooted and restarted the Pktgen application multiple times still the transmission rapidly change to 0.
I had tried to test with different system, with the PCI width x16; even in this system the "Packet Length mismatch occur".

When I enable RTE_LIBRTE_QDMA_DEBUG_DRIVER macro, I could see the transmission sets to 0 when the QDMA driver prints "Packet Length Mismatch" during Pktgen application execution.

@cneely-amd
Copy link
Collaborator
cneely-amd commented Jan 18, 2024

Hi @attdone

I tried with enabling RTE_LIBRTE_QDMA_DEBUG_DRIVER by adding the define to the dpdk-20.11/config/rte_config.h, as you described above, I don't get any errors.

My screenshot has one stray PMD message now, but otherwise looks about the same:
image

I had noticed yesterday that in one of your messages you given a path for using Vitis (rather than Vivado) to build your open-nic-shell. I wanted to ask if you installed XRT on the same machine? The reason is because I vaguely remember there is a potential for XRT to install a driver that gets loaded each boot that can interfere with OpenNIC's drivers. Like if there was another driver for XRT you might need to blacklist it, so that it doesn't get loaded at boot.

On this old test machine, I only installed Vivado_lab edition for the sake of loading the bitfile over JTAG.

Also, I want to confirm whether you are trying the CMAC serdes loopback, too?

Best regards,
--Chris

@attdone
Copy link
Author
attdone commented Jan 19, 2024

Hi @cneely-amd,
Thanks for your response.
In #16 (comment), all the compilation commands for open-nic-shell are performed in Ubuntu system where Vivado 2021 is installed. I am loading the bitfile over JTAG from this Ubuntu system. Could this be a potential cause for the 'Packet Length Mismatch' issue?
alveo

Regarding Serdes Loopback, 'yes' I am enabling those CMAC's.

@cneely-amd
Copy link
Collaborator

Hi @attdone,

I checked with some others and they said that JTAG programming from a second system is typical and is recommended by some. That shouldn't cause any issues.

Best regards,
--Chris

@attdone
Copy link
Author
attdone commented Jan 23, 2024

Hi @cneely-amd,
Thanks for the reply.
I had set max_pkt_size to 9600 during open-nic-shell compilation as specified in Xilinx/open-nic-dpdk#2 (comment) and increased the Maximum length in RTE library as below
"dpdk-20.11/lib/librte_net/rte_ether.h:37:#define RTE_ETHER_MAX_LEN 9600", I could reach the throughput around 40Gbps, but the Packet Length Mismatch error occurs and the throughput reduces to zero.

The PCIe width is x8 and speed 8GT/s in PowerEdge R520. Does the width has any relation with "Packet Length Mismatch" error?
Is there any changes to be made in driver/design/system settings to overcome "Packet Length Mismatch " error?

@cneely-amd
Copy link
Collaborator

Hi @attdone ,

I would have expected that experimentally modifying the max_pkt_size to 9600 not to work properly because the QDMA IP has a limitation of physical page size data units (4kB) for transmitting and receiving. So OpenNIC shell's top level (https://github.com/Xilinx/open-nic-shell/blob/main/src/open_nic_shell.sv) contains a setting to limit packet sizes to 1518 bytes.

I personally haven't encountered the Packet Length Mismatch error before, and I'm not sure how to best advise for that.

You could maybe experiment with removing some other PCI devices in your test system, e.g. using integrated graphics, to free up some lanes to see if it helps, but I don't know if that could be a cause.

Do you have only a single Alveo card, or do you have access to any other Alveo cards that could be tried? You had mentioned using a second test system in an earlier message, and I didn't know whether that was with the same card or another card.

--Chris

@attdone
Copy link
Author
attdone commented Jan 25, 2024

Hi,
@cneely-amd Thanks for your reply.
I will try to remove other PCI devices and update you.
I have used same Alveo U200 card on second system. In second setup, I have used different PC(Dell Precision 3460) instead of PowerEdge R520.

@attdone
Copy link
Author
attdone commented Jan 30, 2024

Hi @cneely-amd,

The other PCIe devices are not able to detach, so is there any other method to resolve "Packet Length Mismatch error"?
Since this error was specified as common in FAQ "https://xilinx.github.io/pcie-debug-kmap/pciedebug/build/html/docs/QDMA_Subsystem_for_PCIExpress_IP_Driver/debug_faq.html" Is there any other method to debug?.

Is it possible to share the PCIe Capabilities obtained at your PC (sudo lspci -s 08:00.0 -vvv ) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants