Skip to content

Conversation

@sarathc-cerebras
Copy link

@sarathc-cerebras sarathc-cerebras commented Dec 7, 2025

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Rocketknight1
Copy link
Member

Hi @sarathc-cerebras, thank you for the PR! The main thing missing is a conversion to modular format. You can look at the modular files for other models to see how it works, but it reduces the size of the PR a lot by importing duplicated code from other models.

@sarathc-cerebras
Copy link
Author

@Rocketknight1 thanks for bringing this up, i have updated it to use the modular format

@sarathc-cerebras sarathc-cerebras force-pushed the add-jais2-model branch 4 times, most recently from 2ae7204 to 672e38a Compare December 9, 2025 14:13
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks good! I made a few comments but they're small.

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, I think we can still simplify a bit and update a few things to be up to date with our current standards. Overall, looking really good already tho

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM good review @vasqu small nits but let's go!

generated_text = self.tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Static cache generated text: {generated_text}")

self.assertGreater(generated_ids.shape[1], input_ids.shape[1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better to have explicit expected outputs here!

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check out the comments from the last review, mostly nits otherwise and let's make the tests more explicit (I've linked an example in one of the review comments)

sarathc-cerebras and others added 14 commits December 12, 2025 23:13
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jais2

@github-actions
Copy link
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42684&sha=9398dd

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments from my side (I hope), small fixes and finishing touches

Comment on lines +69 to +71
End of stream token id.
pretraining_tp (`int`, *optional*, defaults to 1):
Tensor parallelism rank used during pretraining.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
End of stream token id.
pretraining_tp (`int`, *optional*, defaults to 1):
Tensor parallelism rank used during pretraining.
End of stream token id.

TP is no longer handled that way

Comment on lines +81 to +83
The attention head dimension.
rope_theta (`float`, *optional*, defaults to 500000.0):
The base period of the RoPE embeddings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The attention head dimension.
rope_theta (`float`, *optional*, defaults to 500000.0):
The base period of the RoPE embeddings.
The attention head dimension.

Let's move this to default_theta:

pad_token_id: Optional[int] = None,
bos_token_id: Optional[int] = 0,
eos_token_id: Optional[int] = 150024,
pretraining_tp: Optional[int] = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pretraining_tp: Optional[int] = 1,

Comment on lines +116 to +118
# If rope_parameters not provided, create default with rope_theta
if rope_parameters is None:
rope_parameters = RopeParameters(rope_theta=rope_theta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need this, we have a mixin in the config that should handle this for us

The RoPE parameters.
"""

model_type = "jais2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry seems like I was wrong about the TP plan, I didn't notice that we have a different MLP. Can you readd the correct version

model = Jais2ForCausalLM.from_pretrained(
"inceptionai/Jais-2-8B-Chat", torch_dtype=torch.float16, device_map="auto"
)
input_text = "The capital of France is"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we find something that generates more tokens, e.g. 32 tokens? This is a bit few tokens so let's make the test a bit more sensible to changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants