www.digitalmars.com [Home] [Search] [D]

Arrays

There are four kinds of arrays:
	int* p;		Simple pointers to data

	int[3] s;	Static arrays

	int[] a;	Dynamic arrays

	int[char[]] x;	Associative arrays (discussed later)
    

Pointers

	int* p;
	
These are simple pointers to data, analogous to C pointers. Pointers are provided for interfacing with C and for specialized systems work. There is no length associated with it, and so there is no way for the compiler or runtime to do bounds checking, etc., on it. Most conventional uses for pointers can be replaced with dynamic arrays, out and inout parameters, and handles (references).

Static Arrays

	int[3] s;
	
These are analogous to C arrays. Static arrays are distinguished by having a length fixed at compile time.

Dynamic Arrays

	int[] a;
	
Dynamic arrays contain a length and a garbage collected pointer to the array data.

Complex Array Declarations

Declarations for arrays read right to left, so:
	int *[]*[3] p;
	
declares p as an array of 3 pointers to dynamic arrays of pointers to ints.

Usage

There are two broad kinds of operations to do on an array - affecting the handle to the array, and affecting the contents of the array. C only has operators to affect the handle. In D, both are accessible.

The handle to an array is specified by naming the array, as in p, s or a:

	int* p;
	int[3] s;
	int[] a;

	int* q;
	int[3] t;
	int[] b;

	p = q;		p points to the same thing q does.
	p = s;		p points to the first element of the array s.
	p = a;		p points to the first element of the array a.

	s = ...;	error, since s is a compiled in static
			reference to an array.

	a = p;		error, since the length of the array pointed
			to by p is unknown
	a = s;		a is initialized to point to the s array
	a = b;		a points to the same array as b does
    

Slicing

Slicing an array means to specify a subarray of it. For example:
	int a[10];	declare array of 10 ints
	int b[];

	b = a[1..3];	a[1..3] is a 2 element array consisting of
			a[1] and a[2]
    
The [] is shorthand for a slice of the entire array. For example, the assignments to b:
	int a[10];
	int b[]

	b = a;
	b = a[];
	b = a[0 .. a.length];
    
are all semantically equivalent.

Slicing is not only handy for referring to parts of other arrays, but for converting pointers into bounds-checked arrays:

	int *p;
	int b[] = p[0..8];
    

Array Copying

When the slice operator appears as the lvalue of an assignment expression, it means that the contents of the array are the target of the assignment rather than a reference to the array. Array copying happens when the lvalue is a slice, and the rvalue is an array of or pointer to the same type.
	int[3] s;
	int[3] t;

	s[] = t;		the 3 elements of t[3] are copied into s[3]
	s[] = t[];		the 3 elements of t[3] are copied into s[3]
	s[1..2] = t[0..1];	same as s[1] = t[0]
	s[0..2] = t[1..3];	same as s[0] = t[1], s[1] = t[2]
	s[0..4] = t[0..4];	error, only 3 elements in s
	s[0..2] = t;		error, different lengths for lvalue and rvalue
    
Overlapping copies are an error:
	s[0..2] = s[1..3];	error, overlapping copy
	s[1..3] = s[0..2];	error, overlapping copy
Disallowing overlapping makes it possible for more aggressive parallel code optimizations than possible with the serial semantics of C.

Array Setting

If a slice operator appears as the lvalue of an assignment expression, and the type of the rvalue is the same as the element type of the lvalue, then the lvalue's array contents are set to the rvalue.
	int[3] s;
	int *p;

	s[] = 3;		same as s[0] = 3, s[1] = 3, s[2] = 3
	p[0..2] = 3;		same as p[0] = 3, p[1] = 3
    

Array Concatenation

The binary operator ~ is the cat operator. It is used to concatenate arrays:

	int a[];
	int b[];
	int c[];

	a = b ~ c;	Create an array from the concatenation of the
			b and c arrays
    
Many languages overload the + operator to mean concatenation. This confusingly leads to, does:
	"10" + 3
    
produce the number 13 or the string "103" as the result? It isn't obvious, and the language designers wind up carefully writing rules to disambiguate it - rules that get incorrectly implemented, overlooked, forgotten, and ignored. It's much better to have + mean addition, and a separate operator to be array concatenation.

Similarly, the ~= operator means append, as in:

	a ~= b;		a becomes the concatenation of a and b
    
Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array, so:
	a = b			a refers to b
	a = b ~ c[0..0]		a refers to a copy of b
    

Array Operations

In general, (a[n..m] op e) is defined as:
	for (i = n; i < m; i++)
	    a[i] op e;
    
So, for the expression:
	a[] = b[] + 3;
    
the result is equivalent to:
	for (i = 0; i < a.length; i++)
	    a[i] = b[i] + 3; 
    
When more than one [] operator appears in an expression, the range represented by all must match.
	a[1..2] = b[] + 3;	error, 2 elements not same as 3 elements
    

Examples:

	int[3] abc;			// static array of 3 ints
	int[] def = { 1, 2, 3 };		// dynamic array of 3 ints

	void dibb(int *array)
	{
		array[2];		// means same thing as *(array + 2)
		*(array + 2);		// get 2nd element
	}

	void diss(int[] array)
	{
		array[2];		// ok
		*(array + 2);		// error, array is not a pointer
	}

	void ditt(int[3] array)
	{
		array[2];		// ok
		*(array + 2);		// error, array is not a pointer
	}
	

Rectangular Arrays

Experienced FORTRAN numerics programmers know that multidimensional "rectangular" arrays for things like matrix operations are much faster than trying to access them via pointers to pointers resulting from "array of pointers to array" semantics. For example, the D syntax:
	double matrix[][];
	
declares matrix as an array of pointers to arrays. (Dynamic arrays are implemented as pointers to the array data.) Since the arrays can have varying sizes (being dynamically sized), this is sometimes called "jagged" arrays. Even worse for optimizing the code, the array rows can sometimes point to each other! Fortunately, D static arrays, while using the same syntax, are implemented as a fixed rectangular layout:
	double matrix[3][3];
	
declares a rectangular matrix with 3 rows and 3 columns, all contiguously in memory. In other languages, this would be called a multidimensional array and be declared as:
	double matrix[3,3];
	

Array Properties

Arrays have numerous interesting properties:
    length		number of elements in the array

	p.length	error, length not known for pointer
	s.length	compile time constant 3
	a.length	runtime value

    dup			create a dynamic array of the same size
			and copy the contents of the array into it

	p.dup		error, length not known
	s.dup		creates an array of 3 elements, copies
			elements s into it
	a.dup		creates an array of a.length elements, copies
			elements of a into it
    

Array Bounds Checking

It is an error to index an array with an index that is less than 0 or greater than or equal to the array length. If an index is out of bounds, an ArrayBoundsError exception is raised if detected at runtime, and an error if detected at compile time. A program may not rely on array bounds checking happening, for example, the following program is incorrect:
	try
	{
	    for (i = 0; ; i++)
	    {
		array[i] = 5;
	    }
	}
	catch (ArrayBoundsError)
	{
	    // terminate loop
	}
	
The loop is correctly written:
	for (i = 0; i < array.length; i++)
	{
	    array[i] = 5;
	}
	
Implementation Note: Compilers should attempt to detect array bounds errors at compile time, for example:
	int[3] foo;
	int x = foo[3];		// error, out of bounds
	
Insertion of array bounds checking code at runtime should be turned on and off with a compile time switch.

Array Initialization

Static Initialization of Static Arrays

	int[3] a = [ 1:2, 3 ];		// a[0] = 0, a[1] = 2, a[2] = 3
	
This is most handy when the array indices are given by enums:
	enum Color { red, blue, green };

	int value[Color.max] = [ blue:6, green:2, red:5 ];
	
If any members of an array are initialized, they all must be. This is to catch common errors where another element is added to an enum, but one of the static instances of arrays of that enum was overlooked in updating the initializer list.

Special Array Types

Arrays of Bits

Bit vectors can be constructed:
	bit[10] x;		// array of 10 bits
	
The amount of storage used up is implementation dependent. Implementation Note: on Intel CPUs it would be rounded up to the next 32 bit size.
	x.length		// 10, number of bits
	x.size			// 4,  bytes of storage
	
So, the size per element is not (x.size / x.length).

Strings

Languages should be good at handling strings. C and C++ are not good at it. The primary difficulties are memory management, handling of temporaries, constantly rescanning the string looking for the terminating 0, and the fixed arrays.

Dynamic arrays in D suggest the obvious solution - a string is just a dynamic array of characters. String literals become just an easy way to write character arrays.

	char[] str;
	char[] str1 = "abc";
	
Strings can be copied, compared, concatenated, and appended:
	str1 = str2;
	if (str1 < str3) ...
	func(str3 + str4);
	str4 += str1;
	
with the obvious semantics. Any generated temporaries get cleaned up by the garbage collector (or by using alloca()). Not only that, this works with any array not just a special String array.

A pointer to a char can be generated:

	char *p = &str[3];	// pointer to 4th element
	char *p = str;		// pointer to 1st element
	
Since strings, however, are not 0 terminated in D, when transfering a pointer to a string to C, add a terminating 0:
	str.append(0);
	
The type of a string is determined by the semantic phase of compilation. The type is one of: ascii, wchar, ascii[], wchar[], and is determined by implicit conversion rules. If there are two equally applicable implicit conversions, the result is an error. To disambiguate these cases, a cast is approprate:
	(wchar [])"abc"	// this is an array of wchar characters
	
It is an error to implicitly convert a string containing non-ascii characters to an ascii string or an ascii constant.
	(ascii)"\u1234"		// error
	
Strings a single character in length can also be exactly converted to a char or wchar constant:
	char c;
	wchar u;

	c = "b";		// c is assigned the character 'b'
	u = 'b';		// u is assigned the wchar character 'b'
	u = 'bc';		// error - only one wchar character at a time
	u = "b"[0];		// u is assigned the wchar character 'b'
	u = \r;			// u is assigned the carriage return wchar character
	

printf() and Strings

printf() is a C function and is not part of D. printf() will print C strings, which are 0 terminated. There are two ways to use printf() with D strings. The first is to add a terminating 0, and cast the result to a char*:
	str.append(0);
	printf("the string is '%s'\n", (char *)str);
	
The second way is to use the precision specifier. The way D arrays are laid out, the length comes first, so the following works:
	printf("the string is '%.*s'\n", str);
	
In the future, it may be necessary to just add a new format specifier to printf() instead of relying on an implementation dependent detail.

Associative Arrays

D goes one step further with arrays - adding associative arrays. Associative arrays have an index that is not necessarilly an integer, and can be sparsely populated. The index for an associative array is commonly called the key.
	int[char[]] b;		// associative array indexed by character string
	b.length;		// number of elements in the array
	b["hello"] = 3;		// set value associated with "hello" to 3
	func(b["hello"]);	// pass 3 as parameter to func()
	
Particular entries in an associative array can be removed with the delete operator:
	delete b["hello"];
	
The in-expression yields a boolean result indicating if a key is in an associative array or not:
	if ("hello" in b)
		...
	
Associated arrays are supported for all following types.

Properties:

	.length		number of items in the array
	.keys		return array of the keys
	

Associative Array Example: word count

    import stdio;		// C printf()
    import file;		// D file I/O

    int main (char[][] args)
    {
	int word_total;
	int line_total;
	int char_total;
	int[char[]] dictionary;

	printf("   lines   words   bytes file\n");
	for (int i = 1; i < args.length; ++i)	// program arguments
	{
	    char[] input;		// input buffer
	    int w_cnt, l_cnt, c_cnt;	// word, line, char counts
	    int inword;
	    int wstart;

	    input = File.read(args[i]);		// read file into input[]

	    for (int j = 0; j < input.length; j++)
	    {   char c;

		c = input[j];
		if (c == "\n")
		    ++l_cnt;
		if (c >= "0" && c <= "9")
		{
		}
		else if (c >= "a" && c <= "z" ||
		    c >= "A" && c <= "Z")
		{
		    if (!inword)
		    {
			wstart = j;
			inword = 1;
			++w_cnt;
		    }
		}
		else if (inword)
		{   char[] word = input[wstart .. j];

		    dictionary[word]++;		// increment count for word
		    inword = 0;
		}
		++c_cnt;
	    }
	    if (inword)
	    {   char[] word = input[wstart .. input.length];
		dictionary[word]++;
	    }
	    printf("%8ld%8ld%8ld %.*s\n", l_cnt, w_cnt, c_cnt, args[i]);
	    line_total += l_cnt;
	    word_total += w_cnt;
	    char_total += c_cnt;
	}

	if (args.length > 2)
	{
	    printf("-------------------------------------\n%8ld%8ld%8ld total",
		line_total, word_total, char_total);
	}

	printf("-------------------------------------\n");
	char[][] keys = dictionary.keys;	// find all words in dictionary[]
	for (int i = 0; i < keys.length; i++)
	{   char[] word;

	    word = keys[i];
	    printf("%3d %.*s\n", dictionary[word], word);
	}
	return 0;
    }
    

Copyright (c) 1999-2001 by Digital Mars, All Rights Reserved
Last updated: Nov 8, 2001