adds jais2 model support #42684

sarathc-cerebras · 2025-12-07T13:43:09Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2025-12-08T16:46:01Z

Hi @sarathc-cerebras, thank you for the PR! The main thing missing is a conversion to modular format. You can look at the modular files for other models to see how it works, but it reduces the size of the PR a lot by importing duplicated code from other models.

sarathc-cerebras · 2025-12-09T12:57:31Z

@Rocketknight1 thanks for bringing this up, i have updated it to use the modular format

HuggingFaceDocBuilderDev · 2025-12-09T15:40:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1

Yes, this looks good! I made a few comments but they're small.

src/transformers/models/jais2/modular_jais2.py

docs/source/en/model_doc/jais2.md

tests/models/jais2/test_modeling_jais2.py

docs/source/en/model_doc/jais2.md

vasqu

Left some comments, I think we can still simplify a bit and update a few things to be up to date with our current standards. Overall, looking really good already tho

docs/source/en/model_doc/jais2.md

src/transformers/models/auto/configuration_auto.py

src/transformers/models/jais2/__init__.py

src/transformers/models/jais2/modular_jais2.py

tests/models/jais2/test_modeling_jais2.py

ArthurZucker

LGTM good review @vasqu small nits but let's go!

tests/models/jais2/test_modeling_jais2.py

ArthurZucker · 2025-12-11T08:56:38Z

tests/models/jais2/test_modeling_jais2.py

+        generated_text = self.tokenizer.decode(generated_ids[0], skip_special_tokens=True)
+        print(f"Static cache generated text: {generated_text}")
+
+        self.assertGreater(generated_ids.shape[1], input_ids.shape[1])


would be better to have explicit expected outputs here!

vasqu

Please check out the comments from the last review, mostly nits otherwise and let's make the tests more explicit (I've linked an example in one of the review comments)

src/transformers/models/jais2/__init__.py

tests/models/jais2/test_modeling_jais2.py

src/transformers/models/jais2/modular_jais2.py

tests/models/jais2/test_modeling_jais2.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

github-actions · 2025-12-12T21:59:33Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, jais2

github-actions · 2025-12-12T22:31:41Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42684&sha=9398dd

vasqu

Last comments from my side (I hope), small fixes and finishing touches

vasqu · 2025-12-15T09:32:48Z

src/transformers/models/jais2/modular_jais2.py

+            End of stream token id.
+        pretraining_tp (`int`, *optional*, defaults to 1):
+            Tensor parallelism rank used during pretraining.


Suggested change

End of stream token id.

pretraining_tp (`int`, *optional*, defaults to 1):

Tensor parallelism rank used during pretraining.

End of stream token id.

TP is no longer handled that way

vasqu · 2025-12-15T09:35:16Z

src/transformers/models/jais2/modular_jais2.py

+            The attention head dimension.
+        rope_theta (`float`, *optional*, defaults to 500000.0):
+            The base period of the RoPE embeddings.


Suggested change

The attention head dimension.

rope_theta (`float`, *optional*, defaults to 500000.0):

The base period of the RoPE embeddings.

The attention head dimension.

Let's move this to default_theta:

transformers/src/transformers/models/apertus/modular_apertus.py

Line 120 in e17b1b8

default_theta = 12000000.0

vasqu · 2025-12-15T09:35:46Z

src/transformers/models/jais2/modular_jais2.py

+        pad_token_id: Optional[int] = None,
+        bos_token_id: Optional[int] = 0,
+        eos_token_id: Optional[int] = 150024,
+        pretraining_tp: Optional[int] = 1,


Suggested change

pretraining_tp: Optional[int] = 1,

vasqu · 2025-12-15T09:37:00Z

src/transformers/models/jais2/modular_jais2.py

+        # If rope_parameters not provided, create default with rope_theta
+        if rope_parameters is None:
+            rope_parameters = RopeParameters(rope_theta=rope_theta)


We should not need this, we have a mixin in the config that should handle this for us

vasqu · 2025-12-15T09:38:08Z

src/transformers/models/jais2/modular_jais2.py

+            The RoPE parameters.
+    """
+
+    model_type = "jais2"


Sorry seems like I was wrong about the TP plan, I didn't notice that we have a different MLP. Can you readd the correct version

vasqu · 2025-12-15T09:39:02Z

tests/models/jais2/test_modeling_jais2.py

+        model = Jais2ForCausalLM.from_pretrained(
+            "inceptionai/Jais-2-8B-Chat", torch_dtype=torch.float16, device_map="auto"
+        )
+        input_text = "The capital of France is"


Can we find something that generates more tokens, e.g. 32 tokens? This is a bit few tokens so let's make the test a bit more sensible to changes

sarathc-cerebras force-pushed the add-jais2-model branch from 377e2b8 to ab785fc Compare December 9, 2025 12:01

sarathc-cerebras force-pushed the add-jais2-model branch 4 times, most recently from 2ae7204 to 672e38a Compare December 9, 2025 14:13

sarathc-cerebras force-pushed the add-jais2-model branch from 9e0839b to 7dfa45e Compare December 9, 2025 16:10

Rocketknight1 approved these changes Dec 9, 2025

View reviewed changes

sarathc-cerebras force-pushed the add-jais2-model branch from a363e45 to e363470 Compare December 9, 2025 16:58

vasqu reviewed Dec 10, 2025

View reviewed changes

sarathc-cerebras force-pushed the add-jais2-model branch from 2f9713c to 5090c18 Compare December 10, 2025 14:25

ArthurZucker approved these changes Dec 11, 2025

View reviewed changes

sarathc-cerebras force-pushed the add-jais2-model branch from efed368 to f4a67f3 Compare December 11, 2025 12:01

vasqu reviewed Dec 11, 2025

View reviewed changes

sarathc-cerebras and others added 14 commits December 12, 2025 23:13

adds jais2 model support

70670a4

updates tests

7f73568

addresses review comment

697061d

review comments addressed

c32ae4d

addresses test review comments

0c0843b

fixes date

b592376

format issue fix

d5fc67d

Update src/transformers/models/jais2/__init__.py

ab0accf

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update src/transformers/models/jais2/modular_jais2.py

77b931a

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update tests/models/jais2/test_modeling_jais2.py

878f3ff

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update src/transformers/models/jais2/modular_jais2.py

1060b88

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update src/transformers/models/jais2/modular_jais2.py

4a75251

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update src/transformers/models/jais2/modular_jais2.py

519fadd

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update src/transformers/models/jais2/modular_jais2.py

e14036e

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

Update src/transformers/models/jais2/modular_jais2.py

bf97684

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

sarathc-cerebras force-pushed the add-jais2-model branch from 10a5980 to bf97684 Compare December 12, 2025 19:13

fixes tests as per review comment

753a127

updates layernorm setup

9398ddb

vasqu approved these changes Dec 15, 2025

View reviewed changes

adds jais2 model support #42684

Are you sure you want to change the base?

adds jais2 model support #42684

Uh oh!

Conversation

sarathc-cerebras commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Dec 8, 2025

Uh oh!

sarathc-cerebras commented Dec 9, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 9, 2025

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

sarathc-cerebras commented Dec 7, 2025 •

edited

Loading