[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: LayoutLMv3 finetuning on FUNSD Notebook; data preprocessing features #369

Closed
Davo00 opened this issue Nov 13, 2023 · 4 comments
Closed

Comments

@Davo00
Copy link
Davo00 commented Nov 13, 2023

https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv3/Fine_tune_LayoutLMv3_on_FUNSD_(HuggingFace_Trainer).ipynb

ValueError: Arrow type extension<arrow.py_extension_type<pyarrow.lib.UnknownExtensionType>> does not have a datasets dtype equivalent.

Caused by:

# we need to define custom features for `set_format` (used later on) to work properly
features = Features({
    'pixel_values': Array3D(dtype="float32", shape=(3, 224, 224)),
    'input_ids': Sequence(feature=Value(dtype='int64')),
    'attention_mask': Sequence(Value(dtype='int64')),
    'bbox': Array2D(dtype="int64", shape=(512, 4)),
    'labels': Sequence(feature=Value(dtype='int64')),
})
@NielsRogge
Copy link
Owner

Hi,

I just ran the notebook, but it's working for me. Maybe you need to update the Datasets version?

@Davo00
Copy link
Author
Davo00 commented Nov 13, 2023

Hey @NielsRogge , thanks for quick response. I have started a fresh env and installed everything with pip. Tried python 3.9 and 3.10. Which version do you use? Here is some info about my installed versions:

pip show datasets

Name: datasets
Version: 2.14.6
Summary: HuggingFace community-driven open-source library of datasets

pip list

Package Version


accelerate 0.24.1
aiohttp 3.8.6
aiosignal 1.3.1
anyio 3.5.0
appnope 0.1.2
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
asttokens 2.0.5
async-timeout 4.0.3
attrs 23.1.0
backcall 0.2.0
beautifulsoup4 4.12.2
bleach 4.1.0
certifi 2023.7.22
cffi 1.15.1
charset-normalizer 3.3.2
comm 0.1.2
datasets 2.14.6
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.7
entrypoints 0.4
exceptiongroup 1.0.4
executing 0.8.3
fastjsonschema 2.16.2
filelock 3.13.1
frozenlist 1.4.0
fsspec 2023.10.0
huggingface-hub 0.17.3
idna 3.4
importlib-metadata 6.0.0
IProgress 0.4
ipykernel 6.25.0
ipython 8.15.0
ipython-genutils 0.2.0
jedi 0.18.1
Jinja2 3.1.2
joblib 1.3.2
jsonschema 4.19.2
jsonschema-specifications 2023.7.1
jupyter_client 7.4.9
jupyter_core 5.5.0
jupyter-server 1.23.4
jupyterlab-pygments 0.1.2
MarkupSafe 2.1.1
matplotlib-inline 0.1.6
mistune 2.0.4
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
nbclassic 1.0.0
nbclient 0.8.0
nbconvert 7.10.0
nbformat 5.9.2
nest-asyncio 1.5.6
networkx 3.2.1
notebook 6.5.4
notebook_shim 0.2.3
numpy 1.26.1
packaging 23.1
pandas 2.1.3
pandocfilters 1.5.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 10.1.0
pip 23.3
platformdirs 3.10.0
prometheus-client 0.14.1
prompt-toolkit 3.0.36
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 14.0.1
pycparser 2.21
Pygments 2.15.1
python-dateutil 2.8.2
pytz 2023.3.post1
PyYAML 6.0.1
pyzmq 23.2.0
referencing 0.30.2
regex 2023.10.3
requests 2.31.0
rpds-py 0.10.6
safetensors 0.4.0
scikit-learn 1.3.2
scipy 1.11.3
Send2Trash 1.8.2
seqeval 1.2.2
setuptools 68.0.0
six 1.16.0
sniffio 1.2.0
soupsieve 2.5
stack-data 0.2.0
sympy 1.12
terminado 0.17.1
threadpoolctl 3.2.0
tinycss2 1.2.1
tokenizers 0.14.1
torch 2.1.0
tornado 6.3.3
tqdm 4.66.1
traitlets 5.7.1
transformers 4.36.0.dev0
typing_extensions 4.7.1
tzdata 2023.3
urllib3 2.0.7
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 0.58.0
wheel 0.41.2
xxhash 3.4.1
yarl 1.9.2
zipp 3.11.0

@NielsRogge
Copy link
Owner

I just used Google Colab :)

@Davo00
Copy link
Author
Davo00 commented Nov 13, 2023

I just used Google Colab :)

you are right it works in Colab. Really can't understand why the same code wouldn't work on my setup. Could it be a M1 issue? Does it try to use the "MPS" as device?
Anyway, I guess I should rather move this issue to the datasets repo, right?

@Davo00 Davo00 closed this as completed Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants