[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recipe for audio/speech LLM (ltu-as with llama3) #2550

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from

Conversation

BenoitWang
Copy link
Collaborator
@BenoitWang BenoitWang commented May 16, 2024

Hi @mravanelli, here's the ltu-as PR as discussed. I am collecting several new datasets and will start a new round of training but this may take time, so meanwhile I start this PR and carry on little by little. @poonehmousavi you are welcome to review the PR as well 😊.

What does this PR do?

  1. Add a recipe for training the LTU-AS model (an LLM that jointly understands audio and speech).
  2. Slight modifs to the LinearWarmupScheduler class.
  3. Adapt the multiwoz llama2 recipe to the latest changes.

To be done

  • For now, the model is trained with only half the data than in the paper. Though access to certain datasets seems limited, a new training round needs to be carried out with more datasets being collected.
  • Prepare downloadable json files that facilitate the data preparation stage.
  • Better to add a tiny validation set for stage 1 and 2.
  • An evaluation needs to be implemented at the end of stage 3 and the evaluation data needs to be prepared.
  • Upload training logs and prepare a huggingface interface.
  • Recipe tests.
  • Update results and training details in readme.

@mravanelli
Copy link
Collaborator

Thank you @BenoitWang for this contribution. It looks like some tests are failing. Could you please take a look?

@mravanelli mravanelli self-requested a review June 17, 2024 16:24
@mravanelli mravanelli added the enhancement New feature or request label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants