Biber::Utils(3pm)

User Contributed Perl Documentation

Biber::Utils(3pm)

NAME

Biber::Utils - Various utility subs used in Biber

EXPORT

All functions are exported by default.

FUNCTIONS

globU1

Like glob, but takes a Unicode string as its argument.

globU

Like glob, but: (1) Takes a Unicode string as its argument, and (2) tries
NFC and NFD variants of the pattern, to give a useful approximation to a
normalization insensitive glob, which works when filenames are known to
be pure NFC or pure NFD.
This covers among others:

Apple's HFS+ file system, where filenames are coerced to NFD, and filenames
\addbibresource[glob] are NFC (which is natural for keyboard entry, with
typical keyboard layouts).

Similar situation on APFS where **some** (not all) programs made files with
   NFD names even when user types in NFC.

Related issues when transfer between OSs changes NF of filenames but not
   of contents of files, e.g., .tex files.

glob_data_file

Expands a data file glob to a list of filenames

slurp_switchr

Use different read encoding/slurp interfaces for Windows due to its
horrible legacy codepage system

slurp_switchw

Use different write encoding/slurp interfaces for Windows due to its
horrible legacy codepage system

locate_data_file

Searches for a data file by
The exact path if the filename is absolute
In the input_directory, if defined
In the output_directory, if defined
Relative to the current directory
In the same directory as the control file
Using kpsewhich, if available

Check existence of NFC/NFD file variants and return correct one.
Account for windows file encodings

check_empty

Wrapper around empty check to deal with Win32 Unicode filenames

check_exists

Wrapper around exists check to deal with Win32 Unicode filenames

biber_warn

Wrapper around various warnings bits and pieces.
Add warning to the list of .bbl warnings and the master list of warnings

biber_error

Wrapper around error logging
Forces an exit.

makenamesid

Given a Biber::Names object, return an underscore normalised concatenation of all of the full name strings.

makenameid

Given a Biber::Name object, return an underscore normalised concatenation of the full name strings.

latex_recode_output

Tries to convert UTF-8 to TeX macros in passed string

strip_noinit

Removes elements which are not to be considered during initials generation
in names

strip_nosort

Removes elements which are not to be used in sorting a name from a string

strip_nonamestring

 Removes elements which are not to be used in certain name-related operations like:
 * fullhash generation
 * uniquename generation
from a name

normalise_string_label

Remove some things from a string for label generation. Don't strip \p{Dash} as this is needed to process compound names or label generation.

Removes LaTeX macros, and all punctuation, symbols, separators as well as leading and trailing whitespace for sorting strings. Control chars don't need to be stripped as they are completely ignorable in DUCET

normalise_string_bblxml

Some string normalisation for bblxml output

normalise_string

Removes LaTeX macros, and all punctuation, symbols, separators and control characters, as well as leading and trailing whitespace for sorting strings. Only decodes LaTeX character macros into Unicode if output is UTF-8

normalise_string_common

Common bit for normalisation

normalise_string_hash

Normalise strings used for hashes. We collapse LaTeX macros into a vestige
so that hashes are unique between things like:
Smith
{\v S}mith
we replace macros like this to preserve their vestiges:
\v S -> v:
\" -> 34:

normalise_string_underscore

Like normalise_string, but also substitutes ~ and whitespace with underscore.

escape_label

Escapes a few special character which might be used in labels

unescape_label

Unscapes a few special character which might be used in label but which need
sorting without escapes

reduce_array

reduce_array(\@a, \@b) returns all elements in @a that are not in @b

remove_outer

Remove surrounding curly brackets:
    '{string}' -> 'string'
but not
    '{string} {string}' -> 'string} {string'
Return (boolean if stripped, string)

has_outer

Return (boolean if surrounded in braces

add_outer

Add surrounding curly brackets:
    'string' -> '{string}'

ucinit

upper case of initial letters in a string

is_undef

Checks for undefness of arbitrary things, including
composite method chain calls which don't reliably work
with defined() (see perldoc for defined())
This works because we are just testing the value passed
to this sub. So, for example, this is randomly unreliable
even if the resulting value of the arg to defined() is "undef":
defined($thing->method($arg)->method)
whereas:
is_undef($thing->method($arg)->method)
works since we only test the return value of all the methods
with defined()

is_def

Checks for definedness in the same way as is_undef()

is_undef_or_null

Checks for undef or nullness (see is_undef() above)

is_def_and_notnull

Checks for def and unnullness (see is_undef() above)

is_def_and_null

Checks for def and nullness (see is_undef() above)

is_null

Checks for nullness

is_notnull

Checks for notnullness

is_notnull_scalar

Checks for notnullness of a scalar

is_notnull_array

Checks for notnullness of an array (passed by ref)

is_notnull_hash

Checks for notnullness of an hash (passed by ref)

is_notnull_object

Checks for notnullness of an object (passed by ref)

stringify_hash

Turns a hash into a string of keys and values

normalise_utf8

Normalise any UTF-8 encoding string immediately to exactly what we want
We want the strict perl utf8 "UTF-8"

inits

We turn the initials into an array so we can be flexible with them later
The tie here is used only so we know what to split on. We don't want to make
any typesetting decisions in Biber, like what to use to join initials so on
output to the .bbl, we only use BibLaTeX macros.

join_name

Replace all join typsetting elements in a name part (space, ties) with BibLaTeX macros
so that typesetting decisions are made in BibLaTeX, not hard-coded in Biber

filter_entry_options

Process any per_entry option transformations which are necessary on output

imatch

Do an interpolating (neg)match using a match RE and a string passed in as variables
Using /g on matches so that $1,$2 etc. can be populated from repeated matches of
same capture group as well as different groups

ireplace

Do an interpolating match/replace using a match RE, replacement RE
and string passed in as variables

validate_biber_xml

Validate a biber/biblatex XML metadata file against an RNG XML schema

map_boolean

Convert booleans between strings and numbers. Because standard XML "boolean"
datatype considers "true" and "1" the same etc.

process_entry_options

Set per-entry options

merge_entry_options

Merge entry options, dealing with conflicts

expand_option_input

Expand options such as meta-options coming from biblatex

parse_date_range

Parse of ISO8601 date range

parse_date_unspecified

Parse of ISO8601-2:2016 4.3 unspecified format into date range
Returns range plus specification of granularity of unspecified

parse_date_start

Convenience wrapper

parse_date_end

Convenience wrapper

parse_date

Parse of iso8601-2 dates

date_monthday

Force month/day to ISO8601-2:2016 format with leading zero

biber_decode_utf8

Perform NFD form conversion as well as UTF-8 conversion. Used to normalize
bibtex input as the T::B interface doesn't allow a neat whole file slurping.

out

Output to target. Outputs NFC UTF-8 if output is UTF-8

process_comment

Fix up some problems with comments after being processed by btparse

locale2bcp47

Map babel/polyglossia language options to a sensible CLDR (bcp47) locale default
Return input string if there is no mapping

bcp472locale

Map CLDR (bcp47) locale to a babel/polyglossia locale
Return input string if there is no mapping

rangelen

Calculate the length of a range field
Range fields are an array ref of two-element array refs [range_start, range_end]
range_end can be be empty for open-ended range or undef
Deals with Unicode and ASCII roman numerals via the magic of Unicode NFKD form
m-n -> [m, n]
m   -> [m, undef]
m-  -> [m, '']
-n  -> ['', n]
-   -> ['', undef]

match_indices

Return array ref of array refs of matches and start indices of matches
for provided array of compiled regexps into string

parse_range

Parses a range of values into a two-value array ref.
Ranges with no starting value default to "1"
Ranges can be open-ended and it's up to surrounding code to interpret this
Ranges can be single figures which is shorthand for 1-x

strip_annotation

Removes annotation marker from a field name

parse_range_alt

Parses a range of values into a two-value array ref.
Either start or end can be undef and it's up to surrounding code to interpret this

maploopreplace

Replace loop markers with values.

get_transliterator

Get a ref to a transliterator for the given from/to
We are abstracting this in this way because it is not clear what the future
of the transliteration library is. We want to be able to switch.

call_transliterator

Run a transliterator on passed text. Hides call semantics of transliterator
so we can switch engine in the future.

AUTHOR

Philip Kime "<philip at kime.org.uk>"

BUGS

Please report any bugs or feature requests on our Github tracker at https://github.com/plk/biber/issues.

COPYRIGHT & LICENSE

This module is free software. You can redistribute it and/or modify it under the terms of the Artistic License 2.0.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

2025-12-13

perl v5.42.0