String Functions

base32decode

Decode base32-encoded data.

Syntax

strb = base32decode(strt)

Description

base32decode(strt) decodes the contents of string strt which represents data encoded with base32. Characters which are not 'A'-'Z', '2'-'7', or '=' are ignored. Decoding stops at the end of the string or when '=' is reached.

base32encode

Encode data using base32.

Syntax

strt = base32encode(strb)

Description

base32encode(strb) encodes the contents of string strb which represents binary data. The result contains only characters 'A'-'Z' and '2'-'7', and linefeed every 56 characters. It is suitable for transmission or storage on media which accept only uppercase letters and digits, without '0' or '1' easy to misinterpret as letters.

Each character of encoded data represents 5 bits of binary data; i.e. one needs eight characters for five bytes. The five bits represent 32 different values, encoded with the characters 'A' to 'Z' and '2' to '7' in this order. When the binary data have a length which is not a multiple of 5, encoded data are padded with 2, 3, 5 or 6 characters '=' to have a multiple of 8.

Base32 encoding is an Internet standard described in RFC 4648.

Example

s = base32encode(char(0:10))
  s =
    AAAQEAYEAUDAOCAJBI======
d = double(base32decode(s))
  d =
    0  1  2  3  4  5  6  7  8  9 10

base64decode

Decode base64-encoded data.

Syntax

strb = base64decode(strt)

Description

base64decode(strt) decodes the contents of string strt which represents data encoded with base64. Characters which are not 'A'-'Z', 'a'-'z', '0'-'9', '+', '/', or '=' are ignored. Decoding stops at the end of the string or when '=' is reached.

base64encode

Encode data using base64.

Syntax

strt = base64encode(strb)

Description

base64encode(strb) encodes the contents of string strb which represents binary data. The result contains only characters 'A'-'Z', 'a'-'z', '0'-'9', '+', '/', and '='; and linefeed every 60 characters. It is suitable for transmission or storage on media which accept only text.

Each character of encoded data represents 6 bits of binary data; i.e. one needs four characters for three bytes. The six bits represent 64 different values, encoded with the characters 'A' to 'Z', 'a' to 'z', '0' to '9', '+', and '/' in this order. When the binary data have a length which is not a multiple of 3, encoded data are padded with one or two characters '=' to have a multiple of 4.

Base64 encoding is an Internet standard described in RFC 2045.

Example

s = base64encode(char(0:10))
  s =
    AAECAwQFBgcICQo=
double(base64decode(s))
  0  1  2  3  4  5  6  7  8  9 10

char

Convert an array to a character array (string).

Syntax

s = char(A)
S = char(s1, s2, ...)

Description

char(A) converts the elements of matrix A to characters, resulting in a string of the same size. Characters are stored in unsigned 16-bit words. The shape of A is preserved. Even if most functions ignore the string shape, you can force a row vector with char(A(:).').

char(s1,s2,...) concatenates vertically the arrays given as arguments to produce a string matrix. If the strings do not have the same number of columns, blanks are added to the right.

Examples

char(65:70)
  ABCDEF
char([65, 66; 67, 68](:).')
  ABCD
char('ab','cde')
  ab
  cde
char('abc',['de';'fg'])
  abc
  de
  fg

deblank

Remove trailing blank characters from a string.

Syntax

s2 = deblank(s1)

Description

deblank(s1) removes the trailing blank characters from string s1. Blank characters are spaces (code 32), tabulators (code 9), carriage returns (code 13), line feeds (code 10), and null characters (code 0).

Example

double(' \tAB  CD\r\n\0')
  32  9 65 66 32 32 67 68 13 10 0
double(deblank(' \tAB  CD\n\r\0')))
  32  9 65 66 32 32 67 68

hmac

HMAC authentication hash.

Syntax

hash = hmac(hashtype, key, data)
hash = hmac(hashtype, key, data, type=t)

Description

hmac(hashtype,key,data) calculates the authentication hash of data with secret key key and the method specified by hashtype: 'md5', 'sha1', 'sha224', 'sha256', 'sha384', or 'sha512'. Both arguments data and key can be strings (char arrays) which are converted to UTF-8, or int8 or uint8 arrays. The key can be up to 64 bytes; longer keys are truncated. The result is a string of hexadecimal digits whose length depends on the hash method, from 32 for HMAC-MD5 to 128 for HMAC-SHA512.

Named argument type can change the output type. It can be 'uint8' for an uint8 array of 16 or 20 bytes (raw HMAC-MD5 or HMAC-SHA1 hash result), 'hex' for its representation as a string of 32 or 40 hexadecimal digits (default), or base64 for its conversion to Base64 in a string of 24 or 28 characters.

HMAC is an Internet standard described in RFC 2104.

Examples

HMAC-MD5 of 'Authenticated message' using secret key 'secret':

hmac('md5', 'secret', 'Authenticated message')
  4f557b1f67bc4790e6e9568e2f458cf0

Same result computed explicitly, with the notations of RFC 2104: B is the block length, L is the hash length (16 for HMAC-MD5 or 20 for HMAC-SHA1), K is the key padded with zeros to have size B, and H is the hash function, defined here to produce a uint8 hash instead of an hexadecimal string like the LME functions md5 or sha1.

B = 64;
L = 16;
H = @(a) uint8(sscanf(md5(a), '%2x')');
key = uint8('secret');
data = uint8('Authenticated message');
K = [key, zeros(1, B - length(key), 'uint8')];
hash = H([bitxor(K, 0x5cuint8), H([bitxor(K, 0x36uint8), data])]);
sprintf('%.2x', hash)

Simple implementation of the HOTP and TOTP password algorithms (RFC 4226 and 6238) often used for two-factor authentication, with their default parameter values. The password is assumed to be base32-encoded.

function n = hotp(pass, cnt)
  k = uint8(base32decode(pass));
  c = bwrite(cnt, 'uint64;b');
  // or c=bwrite([floor(c/2^32),mod(c,2^32)],'uint32;b');
  hs = hmac('sha1', k, c, type='uint8');
  ob = mod(hs(20), 16);
  dt = mod(sread(hs(ob + (1:4)), [], 'uint32;b'), 2^31);
  n = mod(dt, 1e6);
function n = totp(pass)
  t = floor(posixtime / 30);
  n = hotp(pass, t);

Simple implementation of the PBKDF2 key stretching algorithm (RFC 2898):

function dk = pbkdf2_hmac(hashtype, p, salt, c, dkLen)
  hLen = length(hmac(hashtype, '', '')) / 2;
  dk = uint8([]);
  for i = 1:ceil(dkLen / hLen)
    u = hmac(hashtype, p, [salt, bwrite(i, 'uint32;b')], type='uint8');
    f = u;
    for j = 2:c
      u = hmac(hashtype, p, u, type='uint8');
      f = bitxor(f, u);
    end
    dk = [dk, f];
  end
  dk = dk(1:dkLen);

Test of PBKDF2-HMAC-SHA1 with values provided in RFC 6070 (output format is switched to hexadecimal for easier comparison):

format int x
pbkdf2_hmac_sha1('sha1', 'password', 'salt', 4096, 20)
  0x4b  0x0 0x79  0x1 0xb7 0x65 0x48 0x9a 0xbe 0xad
  0x49 0xd9 0x26 0xf7 0x21 0xd0 0x65 0xa4 0x29 0xc1
format

ischar

Test for a string object.

Syntax

b = ischar(obj)

Description

ischar(obj) is true if the object obj is a character string, false otherwise. Strings can have more than one line.

Examples

ischar('abc')
  true
ischar(0)
  false
ischar([])
  false
ischar('')
  true
ischar(['abc';'def'])
  true

isdigit

Test for decimal digit characters.

Syntax

b = isdigit(s)

Description

For each character of string s, isdigit(s) is true if it is a digit ('0' to '9') and false otherwise. The result is a logical array with the same size as the input argument.

Examples

isdigit('a123bAB12* ')
  F T T T F F F T T F F

isletter

Test for letter characters.

Syntax

b = isletter(s)

Description

For each character of string s, isletter(s) is true if it is an ASCII letter (a-z or A-Z) and false otherwise. The result is a logical array with the same size as the input argument.

isletter gives false for letters outside the 7-bit ASCII range; unicodeclass should be used for Unicode-aware tests.

Examples

isletter('abAB12*')
  T T T T F F F F

isspace

Test for space characters.

Syntax

b = isspace(s)

Description

For each character of string s, isspace(s) is true if it is a space, a tabulator, a carriage return or a line feed, and false otherwise. The result is a logical array with the same size as the input argument.

Example

isspace('a\tb c\nd')
  F T F T F T F

latex2mathml

Convert LaTeX equation to MathML.

Syntax

str = latex2mathml(tex)
str = latex2mathml(tex, mml1, mml2, ...)
str = latex2mathml(..., displaymath=b)

Description

latex2mathml(tex) converts LaTeX equation in string tex to MathML. LaTeX equations may be enclosed between dollars or double-dollars, but this is not mandatory. In string literals, backslash and tick characters must be escaped as \\ and \' respectively.

With additional arguments, which must be strings containing MathML, parameters #1, #2, ... in argument tex are converted to argument i+1.

The following LaTeX features are supported:

variables (each letter is a separate variable)
numbers (sequences of digit and dot characters)
superscripts and subscripts, prime (single or multiple)
braces used to group subexpressions or specify arguments with more than one token
operators (+, -, comma, semicolon, etc.)
control sequences for character definitions, with greek characters in lower case (\alpha, ..., \omega, \varepsilon, \vartheta, \varphi) and upper case (\Alpha, ..., \Omega), arrows (\leftarrow or \gets, \rightarrow or \to, \uparrow, \downarrow, \leftrightarrow, \updownarrow, \Leftarrow, \Rightarrow, \Uparrow, \Downarrow, \Leftrightarrow, \Updownarrow, nwarrow, nearrow, searrow, swarrow, mapsto, hookleftarrow, hookrightarrow, Longleftrightarrow, longmapsto), and symbols (\|, \ell, \partial, \infty, \emptyset, \nabla, \perp, \angle, \triangle, \backslash, \forall, \exists, \flat, \natural, \sharp, \pm, \mp, \cdot, \times, \star, \diamond, \cap, \cup, etc.)
\not followed by comparison operator, such as \not< or \not\approx
control sequences for function definitions (\arccos, \arcsin, \arctan, \arg, \cos, \cosh, \cot, \coth, \csc, \deg, \det, \dim, \exp, \gcd, \hom, \inf, \injlim, \ker, \lg, \liminf, \limsup, \ln, \log, \max, \min, \Pr, \projlim, \sec, \sin, \sinh, \sup, \tan, \tanh) and custom functions with operatorname
accents (\hat, \check, \tilde, \acute, grave, \dot, \ddot, \dddot, breve, \bar, \vec, \overline, \widehat, \widetilde, \underline)
\left and \right
fractions with \frac or \over
roots with \sqrt (without optional radix) or \root...\of...
\atop, \overset, and \underset
large operators (\bigcap, \bigcup, \bigodot, \bigoplus, \bigotimes, \bigsqcup, \biguplus, \bigvee, \bigwedge, \coprod, \prod, and \sum with implicit \limits for limits below and above the symbol; and \int, \iint, \iiint, \iiiint, \oint, and \oiint with implicit \nolimits for limits to the right of the symbol)
\limits and \nolimits for functions and large operators
matrices with \matrix, \pmatrix, \bmatrix, \Bmatrix, \vmatrix, \Vmatrix, \begin{array}{...}.../\end{array}; values are separated with & and rows with \cr or \\
font selection with \rm for roman, \bf for bold face, and \mit for math italic
color with \color{c} where c is black, red, green, blue, cyan, magenta, yellow, white, orange, violet, purple, brown, darkgray, gray, or lightgray
hidden element with \phantom
text with \hbox{...} (brace contents is taken verbatim)
horizontal spaces with \, \: \; \quad \qquad and \!

LaTeX features not enumerated above, such as definitions and nested text and equations, are not supported.

latex2mathml has also features which are missing in LaTeX. Unicode is used for both LaTeX input and MathML output. Some semantics is recognized to build subexpressions which are revealed in the resulting MathML. For instance, in x+(y+z)w, (y+z) is a subpexpressions; so is (y+z)w with an implicit multiplication (resulting in the <mo>⁢<mo> MathML operator), used as the second operand of the addition. LaTeX code (like mathematical notation) is sometimes ambiguous and is not always converted to the expected MathML (e.g. a(b+c) is converted to a function call while the same notation could mean the product of a and b+c), but this should not have any visible effect when the MathML is typeset.

Operators can be used as freely as in LaTeX. Missing operands result in <none/>, as if there were an empty pair of braces {}. Consecutive terms are joined with implicit multiplications.

Named argument displaymath specifies whether the vertical space is tight, like in inline equations surrounded by text (false), or unconstrained, as rendered in separate lines (true). It affects the position of some limits. The default is true.

Examples

latex2mathml('xy^2')
  <mrow><mi>x</mi><mo>&it;</mo><msup><mi>y</mi><mn>2</mn></msup></mrow>
mml = latex2mathml('\\frac{x_3+5}{x_1+x_2}');
mml = latex2mathml('$\\root n \\of x$');
mml = latex2mathml('\\pmatrix{x & \\sqrt y \\cr \\sin\\phi & \\hat\\ell}');
mml = latex2mathml('\\dot x = #1', mathml([1,2;3,0], false));
mml = latex2mathml('\\lim_{x \\rightarrow 0} f(x)', displaymath=true)
mml = latex2mathml('\\lim_{x \\rightarrow 0} f(x)', displaymath=false)

lower

Convert all uppercase letters to lowercase.

Syntax

s2 = lower(s1)

Description

lower(s1) converts all the uppercase letters of string s1 to lowercase, according to the Unicode Character Database.

Example

lower('abcABC123')
  abcabc123

mathml

Conversion to MathML.

Syntax

str = mathml(x)
str = mathml(x, false)
str = mathml(..., Format=f, NPrec=n)

Description

mathml(x) converts its argument x to MathML presentation, returned as a string.

By default, the MathML top-level element is <math>. If the result is to be used as a MathML subelement of a larger equation, a second input argument equal to the logical value false can be specified to suppress <math>.

By default, mathml converts numbers like format '%g' of sprintf. Named arguments can override them: Format is a single letter format recognized by sprintf and NPrec is the precision (number of decimals).

Example

mathml(pi)
  <math>
  <mn>3.1416</mn>
  </math>
mathml(1e-6, Format='e', NPrec=2)
  <math>
  <mrow><mn>1.00</mn><mo>&CenterDot;</mo><msup><mn>10</mn><mn>-6</mn></msup></mrow>
  </math>

mathmlpoly

Conversion of a polynomial to MathML.

Syntax

str = mathmlpoly(pol)
str = mathmlpoly(pol, var)
str = mathmlpoly(..., power)
str = mathmlpoly(..., false)
str = mathmlpoly(..., Format=f, NPrec=n)

Description

mathmlpoly(coef) converts polynomial coefficients pol to MathML presentation, returned as a string. The polynomial is given as a vector of coefficients, with the highest power first; e.g., x^2+2x-3 is represented by [1,2,-3].

By default, the name of the variable is x. An optional second argument can specify another name as a string, such as 'y', or a MathML fragment beginning with a less-than character, such as '<mn>3</mn>'.

Powers can be specified explicitly with an additional argument, a vector which must have the same length as the polynomial coefficients. Negative and fractional numbers are allowed; the imaginary part, if any, is ignored.

By default, the MathML top-level element is <math>. If the result is to be used as a MathML subelement of a larger equation, an additional input argument (the last unnamed argument) equal to the logical value false can be specified to suppress <math>.

Named arguments format and NPrec have the same effect as with mathml.

Examples

Simple third-order polynomial:

mathmlpoly([1,2,5,3])

Polynomial with negative powers of variable q:

c = [1, 2.3, 4.5, -2];
mathmlpoly(c, 'q', -(0:numel(c)-1))

Rational fraction:

str = sprintf('<mfrac>%s%s</mfrac>',
  mathmlpoly(num, false),
  mathmlpoly(den, false));

md5

Calculate MD5 digest.

Syntax

digest = md5(strb)
digest = md5(fd)
digest = md5(..., type=t)

Description

md5(strb) calculates the MD5 digest of strb which represents binary data. strb can be a string (only the least-significant byte of each character is considered) or an array of bytes of class uint8 or int8. The result is a string of 32 hexadecimal digits. It is believed to be hard to create the input to get a given digest, or to create two inputs with the same digest.

md5(fd) calculates the MD5 digest of the bytes read from file descriptor fd until the end of the file. The file is left open.

Named argument type can change the output type. It can be 'uint8' for an uint8 array of 16 bytes (raw MD5 hash result), 'hex' for its representation as a string of 32 hexadecimal digits (default), or base64 for its conversion to Base64 in a string of 24 characters.

MD5 digest is an Internet standard described in RFC 1321.

Examples

MD5 of the three characters 'a', 'b', and 'c':

md5('abc')
  900150983cd24fb0d6963f7d28e17f72

This can be compared to the result of the command tool md5 found on many unix systems:

$ echo -n abc | md5
900150983cd24fb0d6963f7d28e17f72

The following statements calculate the digest of the file 'somefile':

fd = fopen('somefile');
digest = md5(fd);
fclose(fd);

regexp regexpi

Regular expression match.

Syntax

(startIx, endIx, length, grExt) = regexp(str, re)
(startIx, endIx, grExt) = regexpi(str, re)

Description

regexp(str,re) matches regular expression re in string str. A regular expression is a string which contains meta-characters to match classes of characters, repetitions and alternatives, as described below.

Once a match is found, the remaining part of str is parsed from the end of the previous match to find more matches. The result of regexp is an array of start indices in str and an array of corresponding end indices. Empty matches have a length endIx-startIx-1=0.

The third output argument, if present, is set to a list whose items correspond to matches. Items are arrays of size 2-by-ng. Each row corresponds to a group, i.e. a subexpression in parentheses in the regular expression; the first column contains the index of the first character in str and the second column contains the index of the last character.

regexpi is similar to regexp, except that letter case is ignored.

The following regular expression elements are recognized:

Any character other than those described below: Literal match.
. (dot): Any character.
\0: Nul (0).
\t: Tab (9).
\n: Newline (10).
\v: Vertical tab (11).
\f: Form feed (12).
\r: Carriage return (13).
\P where P is one of \()[]{}?*+/: P
\xNN: Character whose code is NN in hexadecimal.
\uNNNN: Character whose code is NNNN in hexadecimal.
[...]: Any of the characters in brackets. Characters can be enumarated (e.g. [ax2] to match a, x or 2), provided as ranges with a hyphen (e.g. [a-c] to match a, b or c) or any combination. Caret ^ must not appear first; closing bracket ] must appear first; and hyphen must not be used in a way which could be interpreted as a range.
[^...]: Any character not enumated in brackets (e.g. [^a-z] for any character except for lowercase letters).
AB: Catenation of A and B.
A|B: One of A or B. | has the lowest priority: ab|c matches ab or c.
A?: A (if possible) or nothing.
A*: As many repetitions of A as possible, including none.
A+: As many repetitions of A as possible, at least one.
A{n}: Exactly n repetitions of A.
A{n,}: At least n repetitions of A (as many as possible).
A{n,m}: Between n and m repetitions of A (as many as possible).
A??: Nothing (if possible) or A.
A*?: As few repetitions of A as possible, including none.
A+?: As few repetitions of A as possible, at least one.
A{n,}?: At least n repetitions of A (as few as possible).
A{n,m}?: Between n and m repetitions of A (as few as possible).
A?+, A*?, A++, A{...}+: Possessive repetitions: as many as possible, but once the maximum number has been found, does not try less repetitions should the remaining part of the regular expression fail to match anything.
(A): Group; matches subexpression A, which is captured for further reference as \N.
(?:A): Group without capture; just matches subexpression A.
\N where N is a digit from 1 to 9: Character substring which was matched by the N:th group delimited by parentheses.
^: Matches beginning of string.
$: Matches end of string.
\b: Beginning or end of word.
(?=A): Positive lookahead: succeeds if what follows matches A without consuming A.
(?!A): Negative lookahead: succeeds if what follows does not match A without consuming A.
(?# comment): Comment (ignored).
\d: Digit (can be used inside or outside brackets).
\D: Not a digit (can be used inside or outside brackets).
\s: White space (can be used inside or outside brackets).
\S: Not white space (can be used inside or outside brackets).
\w: Alphanumeric (can be used inside or outside brackets).
\W: Not alphanumeric (can be used inside or outside brackets).
[:alnum:]: Same as A-Za-z0-9 (must be used inside brackets, e.g. [[:alnum:]])
[:alpha:]: Same as A-Za-z (must be used inside brackets, e.g. [[:alpha:]])
[:blank:]: Same as \x20\x09, i.e. space or tab (must be used inside brackets, e.g. [[:blank:]])
[:cntrl:]: Same as \0-\x1f (must be used inside brackets, e.g. [[:cntrl:]])
[:digit:]: Same as 0-9 (must be used inside brackets, e.g. [[:digit:]])
[:graph:]: Same as \x21-\x7e, i.e. ASCII characters without space and control characters (must be used inside brackets, e.g. [[:graph:]])
[:lower:]: Same as a-z (must be used inside brackets, e.g. [^[:lower:][:digit:]] which is equivalent to [^a-z0-9])
[:print:]: Same as \x20-\x7e, i.e. ASCII characters without control characters (must be used inside brackets, e.g. [[:print:]])
[:punct:]: Same as !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ (must be used inside brackets, e.g. [[:punct:]])
[:space:]: Same as \x20\x09\x0a\x0c\x0d or \s (must be used inside brackets, e.g. [[:space:]])
[:upper:]: Same as A-Z (must be used inside brackets, e.g. [[:upper:]])
[:word:]: Same as [:alnum:]_ (must be used inside brackets, e.g. [[:word:]])
[:xdigit:]: Same as 0-9A-Fa-f (must be used inside brackets, e.g. [[:xdigit:]])

Quantifiers ?, * and +, and their lazy and possessive versions (suffixed with ? or + respectively) have the highest priority. Priority can be changed with parentheses, e.g. (abc)* or (a|bc)d.

Examples

Simple match without metacharacter:

(startIx, endIx) = regexp('Some random string', 'om')
  startIx =
     2 10
  endIx =
     3 11

Dot to match any character:

regexp('Some random string', 'S..e')
  1

Anchor to end of string:

regexp('Some random string', '..$')
  17

Repetition:

regexp('Some random string', 'r.*m')
  6

By default, repetitions are greedy (as many as possible):

(startIx, endIx) = regexp('Some random string', '.*m')
  startIx =
     1
  endIx =
    11

Lazy repetition (as few as possible):

(startIx, endIx) = regexp('Some random string', '.*?m')
  startIx =
     1  4
  endIx =
     3 11

Possessive repetitions keep the largest number of repetitions which provides a match regardless of subsequent failures:

(startIx, endIx) = regexp('Some random string', '.*m ')
  startIx =
     1
  endIx =
    12
(startIx, endIx) = regexp('Some random string', '.*+m ')
  startIx =
    []
  endIx =
    []

Since backslash is an escape character in LME strings, it must be escaped itself:

(startIx, endIx) = regexp('Some random string', '\\b\\w.+?\\b')
  startIx =
    1  6 13
  endIx =
    4 11 18

Reference to a captured group:

(startIx, endIx) = regexp('xx-ab-ab', '(.+)-\\1')
  startIx =
    4
  endIx =
    8

Positive lookahead to find words followed by a colon without picking the colon itself:

(startIx, endIx) = regexp('mailto:foo@example.com', '\\b\\w+(?=:)')
  startIx =
    1
  endIx =
    6

Group (the extent of the whole match is ignored using placeholder output arguments ~):

(~, ~, grExt) = regexp('Regexp are fun', '\\b(\\w+)\\s+(\\w+)\\s+(\\w+)\\b');
grExt{1}
  1  6
  8 10
 12 14

Match ignoring case:

regexpi('Some random string', 'some')
  1

Case-explicit character classes are still case-significant, but character enumerations or ranges are not:

regexpi('Some random string', '^[[:lower:]]')
  []
regexpi('Some random string', '^[a-z]')
  1

setstr

Conversion of an array to a string.

Syntax

str = setstr(A)

Description

setstr(A) converts the elements of array A to characters, resulting in a string of the same size. Characters are stored in unsigned 16-bit words.

Example

setstr(65:75)
  ABCDEFGHIJK

sha1 sha2

Calculate SHA-1 or SHA-2 digest.

Syntax

digest = sha1(strb)
digest = sha1(fd)
digest = sha1(..., type=t)
digest = sha2(...)
digest = sha2(..., variant=v)

Description

sha1(strb) calculates the SHA-1 digest of strb which represents binary data. strb can be a string (only the least-significant byte of each character is considered) or an array of bytes of class uint8 or int8. The result is a string of 40 hexadecimal digits. It is believed to be hard to create the input to get a given digest, or to create two inputs with the same digest.

sha1(fd) calculates the SHA-1 digest of the bytes read from file descriptor fd until the end of the file. The file is left open.

Named argument type can change the output type. It can be 'uint8' for an uint8 array of 20 bytes (raw SHA-1 hash result), 'hex' for its representation as a string of 40 hexadecimal digits (default), or base64 for its conversion to Base64 in a string of 28 characters.

SHA-1 digest is an Internet standard described in RFC 3174.

sha2 calculates the SHA-256 digest, a 256-bit variant of the SHA-2 hash algorithm. Its arguments are the same as those of sha1. In addition, named argument variant can specify one of the supported SHA-2 variants: 224, 256 (default), 384, or 512.

Example

SHA-1 digest of the three characters 'a', 'b', and 'c':

sha1('abc')
  a9993e364706816aba3e25717850c26c9cd0d89d

SHA-224 digest of the empty message '':

sha2('', variant=224)
  d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac5b3e42f

split

Split a string.

Syntax

L = split(string, separator)

Description

split(string,separator) finds substrings of string separated by separator and return them as a list. Empty substring are discarded. sepatator is a string of one or more characters.

Examples

split('abc;de;f', ';')
  {'abc', 'de', 'f'}
split('++a+++b+++','++')
  {'a', '+b', '+'}

strcmp

String comparison.

Syntax

b = strcmp(s1, s2)
b = strcmp(s1, s2, n)

Description

strcmp(s1, s2) is true if the strings s1 and s2 are equal (i.e. same length and corresponding characters are equal). strcmp(s1, s2, n) compares the strings up to the n:th character. Note that this function does not return the same result as the strcmp function of the standard C library.

Examples

strcmp('abc','abc')
  true
strcmp('abc','def')
  false
strcmp('abc','abd',2)
  true
strcmp('abc','abd',5)
  false

strcmpi

String comparison with ignoring letter case.

Syntax

b = strcmpi(s1, s2)
b = strcmpi(s1, s2, n)

Description

strcmpi compares strings for equality, ignoring letter case. In every other respect, it behaves like strcmp.

Examples

strcmpi('abc','aBc')
  true
strcmpi('Abc','abd',2)
  true

strfind

Find a substring in a string.

Syntax

pos = strfind(str, sub)

Description

strfind(str,sub) finds occurrences of string sub in string str and returns a vector of the positions of all occurrences, or the empty vector [] if there is none. Occurrences may overlap.

Examples

strfind('ababcdbaaab','ab')
  1 3 10
strfind('ababcdbaaab','ac')
  []
strfind('aaaaaa','aaa')
  1 2 3

strmatch

String match.

Syntax

i = strmatch(str, strMatrix)
i = strmatch(str, strList)
i = strmatch(..., 'exact')

Description

strmatch(str,strMatrix) compares string str with each row of the character matrix strMatrix; it returns the index of the first row whose beginning is equal to str, or 0 if no match is found. Case is significant.

strmatch(str,strList) compares string str with each element of list or cell array strList, which must be strings.

With a third argument, which must be the string 'exact', str must match the complete row or element of the second argument, not only the beginning.

Examples

strmatch('abc',['axyz';'uabc';'abcd';'efgh'])
  3
strmatch('abc',['axyz';'uabc';'abcd';'efgh'],'exact')
  0
strmatch('abc',{'ABC','axyz','abcdefg','ab','abcd'})
  3

strrep

Replace a substring in a string.

Syntax

newstr = strrep(str, sub, repl)

Description

strrep(str,sub,repl) replaces all occurrences of string sub in string str with string repl.

Examples

strrep('ababcdbaaab','ab','X')
  'XXcdbaaX'
strrep('aaaaaaa','aaa','12345')
  '1234512345a'

strtok

Token search in string.

Syntax

(token, remainder) = strtok(str)
(token, remainder) = strtok(str, separators)

Description

strtok(str) gives the first token in string str. A token is defined as a substring delimited by separators or by the beginning or end of the string; by default, separators are spaces, tabulators, carriage returns and line feeds. If no token is found (i.e. if str is empty or contains only separator characters), the result is the empty string.

The optional second output is set to what follows immediately the token, including separators. If no token is found, it is the same as str.

An optional second input argument contains the separators in a string.

Examples

Strings are displayed with quotes to show clearly the separators.

strtok(' ab cde ')
  'ab'
(t, r) = strtok(' ab cde ')
  t =
    'ab'
  r =
    ' cde '
(t, r) = strtok('2, 5, 3')
  t =
    '2'
  r =
    ', 5, 3'

strtrim

Remove leading and trailing blank characters from a string.

Syntax

s2 = strtrim(s1)

Description

strtrim(s1) removes the leading and trailing blank characters from string s1. Blank characters are spaces (code 32), tabulators (code 9), carriage returns (code 13), line feeds (code 10), and null characters (code 0).

Example

double(' \tAB  CD\r\n\0')
  32  9 65 66 32 32 67 68 13 10 0
double(strtrim(' \tAB  CD\n\r\0')))
  65 66 32 32 67 68

unicodeclass

Unicode character class.

Syntax

cls = unicodeclass(c)

Description

unicodeclass(c) gives the Unicode character class (General_Category property in the Unicode Character Database) of its argument c, which must be a single-character string. The result is one of the following two-character strings:

Class	Description	Class	Description
`'Lu'`	Letter, Uppercase	`'Pi'`	Punctuation, Initial qupte
`'Ll'`	Letter, Lowercase	`'Pf'`	Punctuation, Final Quote
`'Lt'`	Letter, Titlecase	`'Po'`	Punctuation, Other
`'Lm'`	Letter, Modifier	`'Sm'`	Symbol, Math
`'Lo'`	Letter, Other	`'Sc'`	Symbol, Currency
`'Mn'`	Mark, Non-Spcacing	`'Sk'`	Symbol, Modifier
`'Mc'`	Mark, Spacing Combining	`'So'`	Symbol, Other
`'Me'`	Mark, Enclosing	`'Zs'`	Separator, Spcace
`'Nd'`	Number, Decimal Digit	`'Zl'`	Separator, Line
`'Nl'`	Number, Letter	`'Zp'`	Separator, Paragraph
`'No'`	Number, Other	`'Cc'`	Other, Control
`'Pc'`	Punctuation, Connector	`'Cf'`	Other, Format
`'Pd'`	Punctuation, Dash	`'Cs'`	Other, Surrogate
`'Ps'`	Punctuation, Open	`'Co'`	Other, Private Use
`'Pe'`	Punctuation, Close	`'Cn'`	Other, Not Assigned

upper

Convert all lowercase letters to lowercase.

Syntax

s2 = upper(s1)

Description

upper(s1) converts all the lowercase letters of string s1 to uppercase, according to the Unicode Character Database.

Example

upper('abcABC123')
  ABCABC123

utf32decode

Decode Unicode characters encoded with UTF-32.

Syntax

str = utf32decode(b)

Description

utf32decode(b) decodes the contents of uint32 or int32 array b which represents Unicode characters encoded with UTF-32 (basically, Unicode code point). The result is a standard character array with a single row, usually encoded with UTF-16. Invalid codes are ignored.

If all the codes in b correspond to the Basic Multilingual Plane (16-bits, and not surrogate 0xd800-0xdfff), the result is equivalent to char(b).

utf32encode

Encode a string of Unicode characters using UTF-32.

Syntax

b = utf32encode(str)

Description

utf32encode(str) encodes the contents of character array str using UTF-32. Each Unicode character in str, made of 1 or 2 UTF-16 words, corresponds to one UTF-32 code. The result is an array of unsigned 32-bit integers.

If all the characters in str correspond to the Basic Multilingual Plane (16-bits, and no surrogate pairs), the result is equivalent to uint32(str).

Examples

utf32encode('abc')
  1x3 uint32 array
    97  98  99
str = utf32decode(65872uint32);
double(str)
  55296 56656
utf32encode(str)
  65872uint32

utf8decode

Decode Unicode characters encoded with UTF-8.

Syntax

str = utf8decode(b)

Description

utf8decode(b) decodes the contents of uint8 or int8 array b which represents Unicode characters encoded with UTF-8. Each Unicode character corresponds to up to 4 bytes of UTF-8 code. The result is a standard character array with a single row; characters are usually encoded as UTF-16, with 1 or 2 words per character. Invalid codes (for example when the beginning of the decoded data does not correspond to a character boundary) are ignored.

utf8encode

Encode a string of Unicode characters using UTF-8.

Syntax

b = utf8encode(str)

Description

utf8encode(str) encodes the contents of character array str using UTF-8. Each Unicode character in str corresponds to up to 4 bytes of UTF-8 code. The result is an array of unsigned 8-bit integers.

If the input string does not contain Unicode characters, the output is invalid.

Example

b = utf8encode(['abc', 200, 2000, 20000])
  b =
    1x10 uint8 array
      97  98  99 195 136 223 144 228 184 160
str = utf8decode(b);
double(str)
  97    98    99   200  2000 20000

String Functions

base32decode

Syntax

Description

See also

base32encode

Syntax

Description

Example

See also

base64decode

Syntax

Description

See also

base64encode

Syntax

Description

Example

See also

char

Syntax

Description

Examples

See also

deblank

Syntax

Description

Example

See also

hmac

Syntax

Description

Examples

See also

ischar

Syntax

Description

Examples

See also

isdigit

Syntax

Description

Examples

See also

isletter

Syntax

Description

Examples

See also

isspace

Syntax

Description

Example

See also

latex2mathml

Syntax

Description

Examples

See also

lower

Syntax

Description

Example

See also

mathml

Syntax

Description

Example

See also

mathmlpoly

Syntax

Description

Examples

See also

md5

Syntax

Description

Examples

See also

regexp regexpi