File:RLHF diagram.svg

Size of this PNG preview of this SVG file: 512 × 366 pixels. Other resolutions: 320 × 229 pixels | 640 × 458 pixels | 1,024 × 732 pixels | 1,280 × 915 pixels | 2,560 × 1,830 pixels.

Original file (SVG file, nominally 512 × 366 pixels, file size: 177 KB)

This is a file from the Wikimedia Commons. Information from its description page there is shown below.
Commons is a freely licensed media file repository. You can help.

Summary

DescriptionRLHF diagram.svg	English: This is a high-level overview of reinforcement learning from human feedback, including training an initial supervised model, collecting human feedback, training a reward model, and using it to align the initial model.
Date	14 March 2024
Source	Own work
Author	PopoDameron

Licensing

I, the copyright holder of this work, hereby publish it under the following license:

This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

You are free:

to share – to copy, distribute and transmit the work
to remix – to adapt the work

Under the following conditions:

attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

File history

Click on a date/time to view the file as it appeared at that time.

	Date/Time	Thumbnail	Dimensions	User	Comment
current	20:20, 1 April 2024		512 × 366 (177 KB)	PopoDameron	Clarified relationship between RM and aligned model & added description to the aligned model
	04:13, 14 March 2024		512 × 366 (160 KB)	PopoDameron	Uploaded own work with UploadWizard

File usage

The following page uses this file:

Reinforcement learning from human feedback

Global file usage

The following other wikis use this file:

Usage on ar.wikipedia.org
- التعلم المعزز من ردود الفعل البشرية
Usage on fa.wikipedia.org
- یادگیری تقویتی از بازخورد انسانی
Usage on ru.wikipedia.org
- Обучение с подкреплением на основе отзывов людей
Usage on sr.wikipedia.org
- Podržano učenje iz ljudskih povratnih informacija

Metadata

This file contains additional information, probably added from the digital camera or scanner used to create or digitize it.

If the file has been modified from its original state, some details may not fully reflect the modified file.

Width	100%
Height	100%

File:RLHF diagram.svg

Summary

Licensing

Captions

Items portrayed in this file

depicts

Reinforcement Learning from Human Feedback

creator

some value

copyright status

copyrighted

copyright license

Creative Commons Attribution-ShareAlike 4.0 International

inception

14 March 2024

media type

image/svg+xml

source of file

original creation by uploader

File history

File usage

Global file usage

Metadata