@@ -172,10 +172,22 @@ defmodule Kernel.SpecialForms do
172172 iex> << 1, 2, 3 >>
173173 << 1, 2, 3 >>
174174
175- ## Bitstring types
175+ ## Types
176176
177- A bitstring is made of many segments. Each segment has a
178- type, which defaults to integer:
177+ A bitstring is made of many segments and each segment has a
178+ type. There are 9 types used in bitstrings:
179+
180+ - `integer`
181+ - `float`
182+ - `bits` (alias for bitstring)
183+ - `bitstring`
184+ - `binary`
185+ - `bytes` (alias for binary)
186+ - `utf8`
187+ - `utf16`
188+ - `utf32`
189+
190+ When no type is specified, the default is `integer`:
179191
180192 iex> <<1, 2, 3>>
181193 <<1, 2, 3>>
@@ -186,118 +198,169 @@ defmodule Kernel.SpecialForms do
186198 iex> <<0, "foo">>
187199 <<0, 102, 111, 111>>
188200
189- Any other type needs to be explicitly tagged. For example,
190- in order to store a float type in the binary, one has to do:
191-
192- iex> <<3.14 :: float>>
193- <<64, 9, 30, 184, 81, 235, 133, 31>>
194-
195- This also means that variables need to be explicitly tagged,
196- otherwise Elixir defaults to integer:
201+ Variables or any other type need to be explicitly tagged:
197202
198203 iex> rest = "oo"
199204 iex> <<102, rest>>
200205 ** (ArgumentError) argument error
201206
202207 We can solve this by explicitly tagging it as a binary:
203208
209+ iex> rest = "oo"
210+ iex> <<102, rest :: binary>>
211+ "foo"
212+
213+ The utf8, utf16, and utf32 types are for unicode codepoints. They
214+ can also be applied to literal strings and char lists:
215+
216+ iex> <<"foo" :: utf16>>
217+ <<0, 102, 0, 111, 0, 111>>
218+ iex> <<"foo" :: utf32>>
219+ <<0, 0, 0, 102, 0, 0, 0, 111, 0, 0, 0, 111>>
220+
221+ ## Options
222+
223+ Many options can be given by using `-` as separator. Order is
224+ arbitrary, so the following are all equivalent:
225+
226+ <<102 :: integer-native, rest :: binary>>
227+ <<102 :: native-integer, rest :: binary>>
228+ <<102 :: unsigned-big-integer, rest :: binary>>
229+ <<102 :: unsigned-big-integer-size(8), rest :: binary>>
230+ <<102 :: unsigned-big-integer-8, rest :: binary>>
231+ <<102 :: 8-integer-big-unsigned, rest :: binary>>
204232 <<102, rest :: binary>>
205233
206- The type can be integer, float, bitstring/bits, binary/bytes,
207- utf8, utf16 or utf32, e.g.:
234+ ### Unit and Size
208235
209- <<102 :: float, rest :: binary>>
236+ The length of the match is equal to the `unit` (a number of bits) times the
237+ `size` (the number of repeated segnments of length `unit`).
210238
211- An integer can be any arbitrary precision integer. A float is an
212- IEEE 754 binary32 or binary64 floating point number. A bitstring
213- is an arbitrary series of bits. A binary is a special case of
214- bitstring that has a total size divisible by 8.
239+ Type | Default Unit
240+ --------- | ------------
241+ `integer` | 1 bit
242+ `float` | 1 bit
243+ `binary` | 8 bits
215244
216- The utf8, utf16, and utf32 types are for unicode codepoints. They
217- can also be applied to literal strings and char lists:
245+ Sizes for types are a bit more nuanced. The default size for integers is 8.
218246
219- iex> <<"foo" :: utf16>>
220- <<0,102,0,111,0,111>>
247+ For floats, it is 64. For floats, `size * unit` must result in 32 or 64,
248+ corresponding to [IEEE 754](http://en.wikipedia.org/wiki/IEEE_floating_point)
249+ binary32 and binary64, respectively.
221250
222- The bits type is an alias for bitstring. The bytes type is an
223- alias for binary.
251+ For binaries, the default is the size of the binary. Only the last binary in a
252+ match can use the default size. All others must have their size specified
253+ explicitly, even if the match is unambiguous. For example:
224254
225- The signedness can also be given as signed or unsigned. The
226- signedness only matters for matching and relevant only for
227- integers. If unspecified, it defaults to unsigned. Example:
255+ iex> <<name::binary-size(5), " the ", species::binary>> = <<"Frank the Walrus">>
256+ "Frank the Walrus"
257+ iex> {name, species}
258+ {"Frank", "Walrus"}
228259
229- iex> <<-100 :: signed, _rest :: binary>> = <<-100, "foo">>
230- <<156,102,111,111>>
260+ Failing to specify the size for the non-last causes compilation to fail:
231261
232- This match would have failed if we did not specify that the
233- value -100 is signed. If we're matching into a variable instead
234- of a value, the signedness won't be checked; rather, the number
235- will simply be interpreted as having the given (or implied)
236- signedness, e.g.:
262+ <<name::binary, " the ", species::binary>> = <<"Frank the Walrus">>
263+ ** (CompileError): a binary field without size is only allowed at the end of a binary pattern
237264
238- iex> <<val, _rest :: binary>> = <<-100, "foo">>
239- iex> val
240- 156
265+ #### Shortcut Syntax
266+
267+ Size and unit can also be specified using a syntax shortcut
268+ when passing integer values:
241269
242- Here, `val` is interpreted as unsigned.
270+ iex> x = 1
271+ iex> << x :: 8 >> == << x :: size(8) >>
272+ true
273+ iex> << x :: 8 * 4 >> == << x :: size(8)-unit(4) >>
274+ true
243275
244- The endianness of a segment can be big, little or native (the
245- latter meaning it will be resolved at VM load time). Many options
246- can be given by using `-` as separator:
276+ This syntax reflects the fact the effective size is given by
277+ multiplying the size by the unit.
247278
248- <<102 :: integer-native, rest :: binary>>
279+ ### Modifiers
249280
250- Or:
281+ Some types have associated modifiers to clear up ambiguity in byte
282+ representation.
251283
252- <<102 :: unsigned-big-integer, rest :: binary>>
284+ Modifier | Relevant Type(s)
285+ -------------------- | ----------------
286+ `signed` | `integer`
287+ `unsigned` (default) | `integer`
288+ `little` | `integer`, `utf16`, `utf32`
289+ `big` (default) | `integer`, `utf16`, `utf32`
290+ `native` | `integer`, `utf16`, `utf32`
253291
254- And so on.
292+ ### Sign
255293
256- Endianness only makes sense for integers and some UTF code
257- point types (utf16 and utf32).
294+ Integers can be `signed` or `unsigned`, defaulting to `unsigned`.
258295
259- Finally, we can also specify size and unit for each segment. The
260- unit is multiplied by the size to give the effective size of
261- the segment in bits. The default unit for integers, floats,
262- and bitstrings is 1. For binaries, it is 8.
296+ iex> <<int::integer>> = <<-100>>
297+ <<156>>
298+ iex> int
299+ 156
300+ iex> <<int::integer-signed>> = <<-100>>
301+ <<156>>
302+ iex> int
303+ -100
263304
264- Since integers are default, the default unit is 1. The example below
265- matches because the string "foo" takes 24 bits and we match it
266- against a segment of 24 bits, 8 of which are taken by the integer
267- 102 and the remaining 16 bits are specified on the rest.
305+ `signed` and `unsigned` are only used for matching binaries (see below) and
306+ are only used for integers.
268307
269- iex> <<102 , _rest :: size(16) >> = "foo"
270- "foo"
308+ iex> <<-100 :: signed , _rest :: binary >> = <<-100, "foo">>
309+ <<156, 102, 111, 111>>
271310
272- We can also match by specifying size and unit explicitly:
311+ ### Endianness
273312
274- iex> <<102, _rest :: size(2)-unit(8)>> = "foo"
275- "foo"
313+ Elixir has three options for endianness: `big`, `little`, and `native`.
314+ The default is `big`. `native` is determined by the VM at startup.
276315
277- However, if we expect a size of 32, it won't match:
316+ iex> <<number::little-integer-size(16)>> = <<0, 1>>
317+ <<0, 1>>
318+ iex> number
319+ 256
320+ iex> <<number::big-integer-size(16)>> = <<0, 1>>
321+ <<0, 1>>
322+ iex> number
323+ 1
324+ iex> <<number::native-integer-size(16)>> = <<0, 1>>
325+ <<0, 1>>
326+ iex> number
327+ 256
278328
279- iex> <<102, _rest :: size(32)>> = "foo"
280- ** (MatchError) no match of right hand side value: "foo"
329+ ## Binary/Bitstring Matching
281330
282- Size and unit are not applicable to utf8, utf16, and utf32.
331+ Binary matching is a powerful feature in Elixir that is useful for extracting
332+ information from binaries as well as pattern matching.
283333
284- The default size for integers is 8. For floats, it is 64. For
285- binaries, it is the size of the binary. Only the last binary
286- in a binary match can use the default size (all others must
287- have their size specified explicitly).
334+ Binary matching can be used by itself to extract information from binaries:
288335
289- Size and unit can also be specified using a syntax shortcut
290- when passing integer values:
336+ iex> <<"Hello, ", place::binary>> = "Hello, World"
337+ "Hello, World"
338+ iex> place
339+ "World"
291340
292- << x :: 8 >> == << x :: size(8) >>
293- << x :: 8 * 4 >> == << x :: size(8)-unit(4) >>
294- << x :: _ * 4 >> == << x :: unit(4) >>
341+ Or as a part of function definitions to pattern match:
295342
296- This syntax reflects the fact the effective size is given by
297- multiplying the size by the unit.
343+ defmodule ImageTyper
344+ @png_signature <<137::size(8), 80::size(8), 78::size(8), 71::size(8),
345+ 13::size(8), 10::size(8), 26::size(8), 10::size(8)>>
346+ @jpg_signature <<255::size(8), 216::size(8)>>
347+
348+ def type(<<@png_signature, rest::binary>>), do: :png
349+ def type(<<@jpg_signature, rest::binary>>), do: :jpg
350+ def type(_), do :unknown
351+ end
352+
353+ ### Performance & Optimizations
354+
355+ The Erlang compiler can provide a number of optimizations on binary creation
356+ and matching. To see optimization output, set the `bin_opt_info` compiler
357+ option:
358+
359+ ERL_COMPILER_OPTIONS=bin_opt_info mix compile
298360
299- For floats, `size * unit` must result in 32 or 64, corresponding
300- to binary32 and binary64, respectively.
361+ To learn more about specific optimizations and performance considerations,
362+ check out
363+ [Erlang's Efficiency Guide on handling binaries](http://www.erlang.org/doc/efficiency_guide/binaryhandling.html).
301364 """
302365 defmacro unquote ( :<<>> ) ( args )
303366
0 commit comments