BUG: LayoutLMv3 finetuning on FUNSD Notebook; data preprocessing features #369

Davo00 opened this issue Nov 13, 2023 · 4 comments


Davo00 commented Nov 13, 2023


ValueError: Arrow type extension<arrow.py_extension_type<pyarrow.lib.UnknownExtensionType>> does not have a datasets dtype equivalent.

Caused by:

# we need to define custom features for `set_format` (used later on) to work properly
features = Features({
    'pixel_values': Array3D(dtype="float32", shape=(3, 224, 224)),
    'input_ids': Sequence(feature=Value(dtype='int64')),
    'attention_mask': Sequence(Value(dtype='int64')),
    'bbox': Array2D(dtype="int64", shape=(512, 4)),
    'labels': Sequence(feature=Value(dtype='int64')),
I just ran the notebook, but it's working for me. Maybe you need to update the Datasets version?

Davo00 commented Nov 13, 2023

Hey @NielsRogge , thanks for quick response. I have started a fresh env and installed everything with pip. Tried python 3.9 and 3.10. Which version do you use? Here is some info about my installed versions:

pip show datasets

Name: datasets
Version: 2.14.6
Summary: HuggingFace community-driven open-source library of datasets

I just used Google Colab :)

Davo00 commented Nov 13, 2023

I just used Google Colab :)

you are right it works in Colab. Really can't understand why the same code wouldn't work on my setup. Could it be a M1 issue? Does it try to use the "MPS" as device?
Anyway, I guess I should rather move this issue to the datasets repo, right?

