Syntax for Indexing: Why [] is an Antiquated Standard

Mark Lewis
6 min readSep 9, 2020

--

I was just having a conversation with a student who was converting some code from C++ to Scala. She mentioned that she got through a lot of the conversion before remembering that Scala uses parentheses for indexing instead of brackets. She was complaining about the fact that this is different from all the other languages she is used to. I have to admit that when I was first learning Scala I had the same reaction. I was very confused at what seemed like an arbitrary change in syntax from a perfectly valid standard that was in use across all the other languages I knew. Over time I came to see the reason for this change and in case anyone else is feeling like my student, I figure I should write up my logic to share that reasoning. This is also a call for future languages to consider adopting a different view of what indexing really is.

To start with, let's go back a bit in history and talk about the use of brackets for indexing. I’m going to go back to C, though the arguments generally apply to languages like Pascal and Algol as well. The most important part of this argument is that in those languages, the only standard collection was the array. There is some discussion on the web that the origin of this comes from the desire to model math notation in Algol, the math notation they were modeling was range designations. In mathematics, [5,20] represents the range of values from 5 to 20, inclusive. If parentheses were used instead of brackets, the range would be exclusive. The syntax INTEGER ARRAY MYARRAY[5:20] in Algol for creating an array was intended to mirror the mathematical notation. The fact that it was also used for indexing wasn’t really following with math, it was just being consistent with the declaration. While C didn’t allow programmers to specify both a beginning and ending index, hence breaking the similarity to the range notation in mathematics, they kept the same notation for their array declarations and indexing.

The thing which I believe is most significant to keep in mind here is that the fixed-length array was the only “data structure” available as a standard feature of these languages. Indeed, in C, an array is little more than a pointer to a contiguous block of memory of an appropriate size. The syntax a[i] is equivalent to *(a+i) in C. The standard library didn’t provide variable-length sequences, maps, or sets. If you wanted that type of thing, you got to write it yourself or use a 3rd party library. Because of the lack of generics in C, support for generic collections in libraries is rather limited or ad hoc. Given this limited view of what an array is, it makes perfect sense that they use a special syntax different from other features of the language.

That’s not the world most of us program in today though. Today we expect most of our languages to have a standard library that includes a variety of data structures. You expect to be able to use a hashtable-based Map or a tree-based Set right out of the box. And the meaning of “indexing” into such a collection isn’t just a pointer offset into a chunk of memory. It is much more of an arbitrary calculation that will often be more expensive than the O(1) calculation of a pointer offset with a dereference. Though hopefully, it isn’t worse than the O(n) indexing of a linked list.

How this is handled in different languages today varies. In C++, it is possible to overload the behavior of operator[] so they kept the [] notation for all collections. In Java, you have the odd situation where the array type is special and uses [], but every other collection has to use a method, typically called get. In Python, the list and dictionary types use [], but for a set they use in. I won’t even address JavaScript because it doesn’t really have arrays and everything related to collections is odd there. Kotlin allows a bracket notation that compiles to calls to a get method.

This inconsistency across languages, and often inside of languages, seems to me to be related to the fact that languages are hanging onto the historical roots of C, which were adopted from Algol, and programmers’ expectations of using brackets. But it isn’t clear to me that this syntax is really logically motivated given the diversity of collection types that are used in modern programming.

The bracket notation perhaps made sense when it was just arrays and the brackets really did something special that applied just to that type. It seems to me though that in the modern world of programming we need to step back and rethink what it really means to index a collection. When you have a collection c and you index into it to get a value, you are really passing it an argument i and expecting it to give you back a value assuming i is a valid index. From this perspective, collections are just partial functions. Indeed, it makes sense in modern languages that support functional programming for the collection types to be functions. If collections are functions though, why should their syntax be any different from other functions? You call a normal function with the syntax f(x) and if you see the collection c as a function, then the proper syntax for indexing it is c(i).

This isn’t just an argument about syntax though. Viewing collections as functions opens up a lot of possibilities that you don’t get if you don’t use a syntax that is consistent with the function call syntax. To see this, let’s consider two simple examples that use the most common higher-order methods found on collections across most languages, map and filter. I’m going to use the Scala syntax for these examples because, in Scala, the collections are functions.

For map, I’ll use the example of slicing data. This is a feature that has been added to a number of different languages. Matlab is one where I have used it most, but Python supports it as well. If you treat collections as functions, this functionality comes along automatically without adding anything to the language.

val data = ... // Some large sequence of values
val indices = 1 to 100 by 3 // the indices you want in your slice
val slice = indices.map(data)

This code will put the values from data that had been in indexes 1, 4, 7, … into the slice sequence. This works because data has the type Int => A where A is the type of data in the sequence. So for each index, i, in indices we “call” data(i) and build a new sequence of the results. This code only works because the sequence type is a function. In Scala, this is true for all the sequence types, whether they are arrays, linked lists, or anything else. They are all consistent.

In the case of filter I’ll use the Map and Set types to motivate the example. Assume we have a Map[String, StateData] that stores information about states keyed by their names. We also have a Set[String] that has the names of some states we are interested in.

val stateData: Map[String, StateData] = ...
val interestingStates: Set[String] = Set("Texas", "Ohio", ...)
val interestingData = stateData.filterKeys(interestingStates)

I am using filterKeys instead of the plain filter because I want to filter the Map just based on the keys. Here again, the key point is that filter and filterKeys take functions as arguments. They need their arguments to behave syntactically as functions, not just semantically. You don’t want to make special cases that work with [] for looking up values. It is the syntactic regularity introduced by realizing the collections are just functions and should be treated as such that provides this power.

If you happen to have other examples of where treating collections as functions can be useful, please share them in the comments.

So in conclusion, collections are functions and should be treated as such to make languages consistent. That means they should be “indexed” with parentheses just as all functions are called and the antiquated bracket syntax that really only makes sense in a situation where arrays are the only collections should it be dropped. If you happen to be considering making a new language, I implore you to consider this. For everyone else who just uses languages, I hope that you look at the [] in your language a bit differently and consider trying out Scala or other functional languages that treat collections in this more consistent manner. When you do, I hope that you see the consistency and power it provides and not just how it is different from what you have done in the past.

--

--

Mark Lewis

Computer Science Professor, Planetary Rings Simulator, Scala Zealot