Skip to content

Conversation

@iabyn
Copy link
Contributor

@iabyn iabyn commented Oct 2, 2025

Rewrite perlxs.pod

This branch completely rewrites and modernises the XS reference manual,
perlxs.pod.

The new file is about twice the size of the old one.

This branch:

  • deletes some obsolete sections;

  • reorders the existing sections into a more logical order;

  • adds a large new introductory/overview part, which explains
    all the background needed to understand what XSUBs do, including
    SVs, the stack, reference counts, magic etc.

  • includes a BNF syntax section

  • modernises: e.g. it uses "ANSI" parameter syntax throughout

  • has a fully-worked example using T_PTROBJ

Note that although each commit in this branch may have a complex-looking
diff for the updating of a particular section, in reality most sections
haver been rewritten from scratch, and the diff output is showing
paragraph breaks as fixed unchanging points, so that it appears as lots of
individual paragraph changes rather than "delete all this text, add new
text". If reviewing, it may be easier to just read the final perlxs.pod
file instead of looking at the diffs.

  • This set of changes requires a perldelta entry, and I will write one later

Copy link
Contributor

@johannessen johannessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, mostly about minor things.

Overall, I found the document to be well structured and the prose to be easily readable, in spite of the fairly complex topic.

@iabyn
Copy link
Contributor Author

iabyn commented Oct 9, 2025 via email

@tonycoz
Copy link
Contributor

tonycoz commented Oct 9, 2025

Yes, the -std flag is set by the cflags script and not in $Config{ccflags}.

Some systems (I think non-x86 BSDs) can use old gccs that don't default to c99 or later.

@iabyn
Copy link
Contributor Author

iabyn commented Oct 12, 2025 via email

@iabyn
Copy link
Contributor Author

iabyn commented Oct 17, 2025 via email

@tonycoz
Copy link
Contributor

tonycoz commented Oct 21, 2025

I'm done for now.

@iabyn
Copy link
Contributor Author

iabyn commented Oct 21, 2025 via email

@iabyn
Copy link
Contributor Author

iabyn commented Oct 21, 2025 via email

Copy link
Contributor

@wolfsage wolfsage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(more to come, don't want this to get lost)

Copy link
Contributor

@wolfsage wolfsage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright that's enough for one night, more later

@iabyn
Copy link
Contributor Author

iabyn commented Oct 27, 2025 via email

@tonycoz
Copy link
Contributor

tonycoz commented Oct 27, 2025

I don't understand. Aren't we both saying the same thing?

You and I are, wolfsage seemed to think otherwise.

Copy link
Contributor

@wolfsage wolfsage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final review done! Few more small things found.

@iabyn
Copy link
Contributor Author

iabyn commented Nov 9, 2025

Thanks for all the feedback so far.

The two commits I've just appended / pushed to this branch contain fixups for every comment from your reviews. As a general rule, if I didn't reply to a comment, it means I implicitly accepted the point.

I made the standardisation of xsubpp version numbers into a separate commit because all the changes were related. The second commit is just a big pile of random minor fixes and rewordings.

@Leont
Copy link
Contributor

Leont commented Nov 17, 2025

As a general rule, if I didn't reply to a comment, it means I implicitly accepted the point.

Please mark comments as resolved when you're done with them, that cleans up this thread immensely for anyone reading it. Now it isn't even rendering everything because there are so many of them.

@iabyn
Copy link
Contributor Author

iabyn commented Nov 19, 2025

Does anyone still intend to do any further review? Or is this PR approaching being mergeable?

The XS parser supported an extremely obscure bit of functionality which
made use of the %v package variable to maintain state between different
bits of typemap processing. This was accidentally broken in 5.10.0:
refactoring removed the 'use vars "%v"' line, and no one seemed to
notice or care.

Also, the sole example of its use in the docs seemed to be obscure,
confusing and probably wrong.

There was a consensus in the discussion at

    http://nntp.perl.org/group/perl.perl5.porters/267667

that we should stop documenting this feature rather than trying to fix
it.
iabyn added 25 commits December 2, 2025 11:48
Rewrite (and retitle) these three subsections:

    =head3 Default Parameter Values
    =head3 The C<length(NAME)> Keyword
    =head3 Variable-length Parameter Lists
Add text for the new '=head2 The XSUB Input Part' section, and rewrite
the existing entry for the PREINIT keyword.
This commit completely rewrites this section and subsections:

    =head3 The INPUT: Keyword
        =head4 The NO_INIT Keyword
        =head4 Initializing Function Parameters
        =head4 The & Unary Operator

It de-emphasises the INPUT keyword and suggests using ANSI XS signatures
etc instead.
Add text for the new '=head2 The XSUB Init Part' section, and rewrite
the existing entry for the INIT keyword.
Add text to the new

    =head2 The XSUB Code Part
    =head3 Auto-calling a C function

sections, and rewrite the existing

    =head4 The C_ARGS: Keyword

section
Rewrite these sections:

    =head3 The CODE: Keyword
    =head3 The PPCODE: Keyword
This keyword formerly wasn't documented. The docs now say "this is what
it is, but don't use it".
Add text to the new

    =head2 The XSUB Output Part

section, and rewrite the text in these existing sections:

    =head3 The POSTCALL: Keyword
    =head3 The OUTPUT: Keyword
Add text to the new

    =head2 The XSUB Cleanup Part

section, and rewrite the text in this existing section:

 =head3 The CLEANUP: Keyword
Add text to the new

    =head2 XSUB Generic Keywords

section, and rewrite the text in this existing section:

    =head3 The PROTOTYPE: Keyword
Explain that a 'C' parameter type in an XSUB declaration can actually
be a Perl package name or similar, e.g.

    Foo::Bar
    f(Foo::Bar obj, char *s)
First, add a new subsection

    =head3 T_PTROBJ and opaque handles

to the TYPEMAPs section explaining how this typemap can be used to
map between Perl objects and C library handles. It provides a
fully-worked example of wrapping a simple arithmetic library.

Then completely rewrite the

    =head3 The OVERLOAD: Keyword

section. In particular, it now refers to the new T_PTROBJ example and
shows how it can be extended to use overloading.
This keyword was undocumented, even though it had been added 25 years
ago.
Populate the introduction to this new section.
Rewrite this section:

    =head3 The ALIAS: Keyword
Rewrite these sections:

 =head3 The INTERFACE: Keyword
 =head3 The INTERFACE_MACRO: Keyword

also demote the second to be a head4 child of the first. Then expand
the T_PTROBJ example to use INTERFACE as an alternative to ALIAS.
Rewrite this section:

    =head3 The CASE: Keyword
Populate this new section (except for the T_PTROBJ subsection, which had
already been added by an earlier commit within this branch).

Note that the "Common typemaps" subsection could probably benefit
from some further expansion by someone familiar with which built-in
T_FOO entries are useful.
Rewrite this section:

 =head2 Using XS With C++

Disclaimer: I've never written a proper C++ program. I had to
(literally) dust off my 34-year old copy of Stroustrup(*) and also do
some Googling. Hopefully what I've written is sane.

(*) This was bought back in the days when people used to to learn things
by buying books, and when I thought that I ought to know something about
this newfangled C++ thing. I never got round to reading all of it: I
discovered Perl around the same time, which looked to be a lot more fun.
Revise the text in this section:

    =head2 Safely Storing Static Data in XS
Rewrite this section:

     =head1 EXAMPLES

Basically, delete the one big example in this section and instead
provide links to various other examples already present in this document
instead.
Tweak the final few sections of perlxs.pod.
After the big rewrite, various bits of text which describe when a
particular feature was introduced or changed have ended up using a
random mixture of Perl, ParseXS and xsubpp version numbers.

This commit standardises on xsubpp version numbers. These are mostly
the same as ParseXS, but this handles the cases before ParseXS was
split off from xsubpp. Perl versions suffer from not exactly matching
when an xsubpp version number was incremented, and not matching what is
used by the "REQUIRE:" keyword.

This commit also adds a short new section which tries to explain how the
three sets of version numbers are related.
The many commits in this branch have completely rewritten perlxs.pod.
This commit applies all the minor tweaks suggested by reviewers.

(It was far too much like hard work to try and update each individual
commit with the various changes.)
@iabyn
Copy link
Contributor Author

iabyn commented Dec 2, 2025

I will merge this PR within a few days unless anyone has any last-minute objections.

Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, a much bigger (and better) document than previous. So much so I've only had time to review about the top 20% so far but I wanted to commit these comments before they get lost, and to say I will review the rest in the next day or so.

Overall so far looking very much better, here's just a few tiny nits.

Comment on lines +427 to +428
which often requires extensive knowledge of the Perl interpreter's
internals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you calling me out here? I feel called out ;)


(This next paragraph is definitely only for background on debugging.)

An C<OP> is is a data structure within the perl interpreter. It is used to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate "is is"

Suggested change
An C<OP> is is a data structure within the perl interpreter. It is used to
An C<OP> is a data structure within the perl interpreter. It is used to

The various OPs executed by the Perl interpreter up until the function is
called will: push a mark indicating the start of a new argument stack
frame; push an SV containing the integer value 1; push the SV currently
bound to the variable C<$x>; and push the C<*foo> typeglob. Then the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "bound to" is likely to confuse folks not from a Lisp/Scheme background. Maybe "the SV currently representing the variable C<$x>" ?

Comment on lines +694 to +697
For a normal Perl subroutine call, C<pp_entersub()> will then: pop the
topmost mark off the mark stack; pop the SV pointers between that mark and
the top of the stack and store them in C<@_>; then set C<PL_op> to the
first OP pointed to by the C<&foo> CV. Those OPs will then be run by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double space "first OP".

Also, I see some documentation I'm going to have to alter if I get the CVf_NOSNAIL flag working :)


As mentioned above, almost all runtime data within the perl interpreter is
stored in an SV (scalar value) structure. The head of an SV structure
consists of three or four words: a reference count; a type and flags; a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "words" is possibly not a word [pardon the phrasing] that would make sense to folks outside of a machine-level programming background. I'd say "fields", just relating to the members of the C struct.

STRLEN len;
char *pv = SvPV(sv, len);

which both retrieves a string pointer and sets C<len> to its length. Note
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help casual readers to explicitly point out that since SvPV is a macro, it can do the evil trickery of modifying len directly, so you don't have to pass in &len.

@leonerd leonerd self-assigned this Dec 3, 2025
Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got around to reading and reviewing the entire rest.
I admit I somewhat skimmed over the C++ related parts as I don't know the technical details here, though I have read over all the wording and style at least.

Overall a few small comments but generally this is looking much better indeed.

OUTPUT:
RETVAL

Finally, note that some very old (pre-1996) XS documentation suggested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another double-space on this line

PPCODE:
{
int i;
int max_ix = AvFILL(av);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we intending these code examples still run on (much) older perls? For a while now we've had av_count() which returns a count, so you don't have to do the more awkward + 1 logic elsewhere. But that wouldn't work before it was added - 87306e0 in time for 5.34.

limited use and are typically only used to mimic the behaviour of Perl
builtins. For example there is no way to implement a C<push @a, ...;>
style function without a way of telling the Perl interpreter not to
flatten C<@a>. Outside of these narrow uses, it is generally a mistake to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-space


Note however that, due to a quirk in parsing, it is possible for a
C<TYPEMAP:> block which comes I<immediately after> an XSUB to affect any
entries used by that XSUB, as if the block had appeared just before the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-space

Also: Oh gosh I can't imagine the crazy amount of parsing logic that results in this particular quirk. ;)

An XSUB definition consists of a declaration (typically two lines),
followed by an optional body. The declaration specifies the XSUB's name,
parameters and return type. The body consists of sections started by
keywords, which may specify how its parameters and any any return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated "any any"

Comment on lines +3585 to +3587
Note that C<OVERLOAD> shouldn't be mixed with the L<ALIAS|/The ALIAS:
Keyword> keyword; the value of C<ix> will be undefined for any overload
method call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: Is this an implementation bug that we should consider fixing?

information stored in the CV indicates which C library function should be
autocalled.

Finally, there is the C<CASE> keyword, which allows the whole body of an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-space

and the function pointer.
The alias name can be either a simple function name or can include a
package name. The alias value to the right of the C<=> may be either a
literal positive integer or a word (which is expected to be a CPP define).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or enum constant. I've done that several times.

my $color = Foo::Bar->new(0x10, 0x20, 0xff);
printf "blue=%d\n", $color->blue(); # prints 255
$color->set_blue(0x80);
printf "blue=%d\n", $color->blue(); # prints 128

=head2 Safely Storing Static Data in XS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this entire section is talking about mutable static data, yes? I've seen (and written) many cases of static const data structures being used all over .xs files. As they're const and set up purely at compiletime, there's no issues with mutations across threads.

Should there be perhaps some wording on that topic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, though it would also include a static SV* const and similar.


=head1 AUTHOR

Originally written by Dean Roehrich <F<roehrich@cray.com>>.
Completely rewritten in 2025.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add your name here of course :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants