The Struggle of Dynamically Typed Languages
My department is currently involved in a curriculum revision. We are looking at pretty much everything as part of this evaluation. One of the items is the programming languages we use for various courses, including our introductory courses. I am a strong proponent of statically typed languages for these courses. I have colleagues who are pushing for dynamically typed languages. Many people seem to believe that dynamically typed languages are easier for students to learn, but I haven’t seen any compelling evidence for this. Indeed, I think they have certain issues that make learning to program harder. Those issues, however, are most implicit than explicit, much like the type systems themselves. Types matter in every language. Most people agree that sqrt("hi")
doesn’t make much sense. What varies is when and how different languages tell you about these mistakes. (I have to note that sqrt("hi")
isn’t actually an error in all languages. In Perl, it runs perfectly happily and produces a value of 0. This is a direct consequence of Perl’s rules for converting strings to numbers and isn’t surprising if you know the rules of Perl. In JavaScript this produces NaN
and I don’t have a logical explanation for that behavior. Though I guess it is consistent with the fact that 5-”hi"
is also NaN
in JavaScript.)
Just because a language is easy for an experienced developer to learn, that doesn’t mean it makes things easier for a novice. Perhaps the best research on this topic was a paper with the title “Python Versus C++: An Analysis of Student Struggle on Small Coding Exercises in Introductory Programming Course” (https://dl.acm.org/doi/10.1145/3159450.3160586). They used the remarkably large dataset from Zybooks to look at student performance solving problems across multiple institutions and courses. The size of their dataset removed a lot of the selection effects that normally happen when people try to compare student learning across languages. The remarkable result of this research was that students “struggled” significantly more (2x) solving the problems in Python than in C++. Their metric for “struggle” was taking a long time or making a large number of submissions.
When this paper was presented there was a lot of discussion about why this might be. My personal favorite explanation is that dynamically-typed scripting languages, like Python, tend to encourage students to just try things without spending time thinking about them. So when a student gets something wrong, they are more inclined to make a little change and try again than they are to pause and think about the problem and how to fix it. I would argue that this is at least somewhat due to the dynamic aspects of the language. The language a student learns to program with inevitably impacts the mental scaffolding that they develop to understand things. My guess is that being forced to think about types upfront causes students to be more likely to pause and think through certain errors. It might also be true that the messages students get for type errors in statically typed languages are more likely to point directly to the real cause of the error and hence students can fix a lot of errors faster in statically typed languages than in dynamically typed languages. More on that below.
Unfortunately, I am not aware of any further work that explores alternate explanations or does comparisons across other languages. I wish there was. Publishers like Zybooks have the ability to look at things in ways that pretty much no one else can because of the size of their datasets.
Confusion on Types
So let’s consider a little bit of code that a student learning to program might write. I’m going to use Python here, but one can easily construct similar examples in any dynamically typed language. If anything, Python does better than most other dynamically typed languages at these things.
We begin with a function defining multiplication. It looks pretty much exactly as you would expect.
>>> def mult(x, y):
... return x*y
Now let’s call it with some variables that I’ve defined.
>>> mult(x1, y1)
50
>>> mult(x2, y2)
'1010101010'
Confused? If you know python well, probably not. However, if you are a novice student, then this is a real head-scratcher. To make it clear, here are the assignments used for those calls.
>>> x1 = 5
>>> y1 = 10
>>> x2 = 5
>>> y2 = '10'
If you think this is far-fetched, consider the values are read from a file and that read is encapsulated in a function. Here’s a little example. It is easy to imagine these being just a few functions in a program with several others for an assignment in a course.
def readData(filename):
f = open(filename, 'r')
v1 = int(f.readline())
v2 = f.readline()
v3 = f.readline()
f.close()
return (v1, v2, v3)def mult(x, y):
return x * ydef useData(x, y, desc):
# Imagine this has more too it
print(str(mult(x, y)) + ' ' + desc)(x, y, desc) = readData('data.txt')
useData(x, y, desc)
I ran this program with a little sample file and got the following:
10
10
10
10
10
Some data.
There are a number of challenges here for the novice programmer. What process does the student follow to figure out what went wrong? Where do they even start looking? These are two separate issues. The honest truth is that a novice programmer probably just starts changing things to see if it fixes the problem. (At least that is my hypothesis based on the “struggle” research.) However, if I’m their instructor, I’m going to harp on them to put in print statements to help debug. You can’t fix a problem until you know what it is. The student needs more information. So let’s say they think the problem is in mult
so they add print(x, y)
to that function. For the data I used, it prints 5 10
. That’s exactly what they were expecting to see. Because the printed value for '5'
and 5
is the same, the normal debug print statements are much less helpful than one might hope.
But even putting in print statements assumes that they have some idea of where to look and the figured out where they want to put the print statements. To my mind, the real problem here is that this is a logic error, and where it manifests is far away from the actual part of the code that has the problem. The problem is that the assignment to v2
is lacking a conversion to int
, but that error doesn’t manifest until they do the print in a completely different function. Remember that there are basically three types of errors we run into when coding: syntax errors, runtime errors, and logic errors. They form a hierarchy. Syntax errors happen early and come with a message that points close to where the error is. We can get these in Python by messing up parentheses, colons, or some indentation. Runtime errors are crashes. They generally provide information, but they aren’t always close to where the real error is in the code. Instead, they point to where the error manifests and actually causes something to happen that creates the error. Logic errors are the worst because when you have a logic error the code runs and you get no information other than seeing that something went wrong. That something going wrong is very often far away from the real source of the problem, as we see here. As a general rule, dynamically typed scripting languages have few syntax errors because there isn’t a compiler checking for much. If the structure is correct enough to parse, it will run. For this reason, most of their errors are runtime or logic errors.
The Importance of Early Feedback
One of the general rules of teaching is that early feedback is better than late feedback. Students should know what they did wrong as quickly as possible so that they can learn from it. I would argue this applies to learning programming as well and that there is a benefit to using a language that provides early feedback on mistakes. I just pointed out how the different error types occur at different times. Let’s look at another example, this time with a runtime error, and see how that matters. This segment of code might appear in a text-adventure type of program that has elements similar to D&D.
import randomdef attack():
d1 = random.randint(1, 6)
d2 = random.randint(1, 6)
n = d1 + d2
if n == 2:
n = 'critical miss!'
elif n < 8:
m = 'weak hit.'
elif n < 12:
m = 'good hit.'
else:
m = 'critical hit!'
return (n, m)(damage, message) = attack()
print('You got a ' + message)
print('You did ' + str(damage) + ' damage.')
I ran this 42 times immediately after writing it before it crashed. When it crashed, it told me the error was on the line with the return. That’s not exactly where the typo is, but at least it is in the right function.
The point here is that runtime errors (and logic errors) provide late feedback. They only tell you there is a problem when it actually happens. If a line of code isn’t reached commonly, errors in that line will go completely unnoticed. Most students wouldn’t have run this code 42 times to even find out there was an error. They would have happily submitted it thinking it was fine only to get a poor grade when it crashed in my testing. In a statically typed language, these errors would be syntax errors. There would be immediate feedback and little confusion.
This particular error relates not just to dynamic typing, but also to scope. This is a rough area for Python. Most languages use block-scope. Python doesn’t. JavaScript didn’t but the people making the ECMAScript standards realized that was a problem and addedlet
and const
declarations that have block scope. Unfortunately, you can still use var
in JavaScript or skip declarations all together so students could still be tripped up by this example in that language.
Before moving on, I want to note that it isn’t hard to extend this example in such a way that the runtime error doesn’t occur in the same function as the mistake. If I included m=''
early in the function I wouldn’t get the error saying m
isn’t declared. Instead, n
would escape with a string and later on, if I tried to do math with that value someplace else, the code would crash. I mention this to highlight the fact that runtime errors might provide information about where the crash occurs, it is very easy for that to be far away from the actual mistake in the code. This lack of locality leads to confusion and wasted time. One might even say that it makes the student “struggle”.
Typos Everywhere
We all make typos when we code. How much time we waste on those typos depends on various factors like when we find out about them and how much information we get about them. Technically, my example above was a typo where I swapped n
for m
. Indeed, I find that the most common typos I make are in the names of things. My fingers might leave out a letter or put an extra one in. Or maybe I forget the exact name of something. Name typos where I add an “s” or leave one off converting from singular to plural are also common for me.
Almost every typo of that nature in a dynamically typed language becomes a runtime error. Just to test this I added the line print(ln(n))
in the example above right after I assigned the string value to n
in the case of rolling a 2. As you can see, I left out the “e” in len
. As before, the code ran just fine several times before that case came up and I got the following error message.
Traceback (most recent call last):
File "pdice.py", line 18, in <module>
(damage, message) = attack()
File "pdice.py", line 9, in attack
print(ln(n))
NameError: name 'ln' is not defined
Let’s look again at a slightly larger example. In this case I’m going to make a class, but any code that uses a class can run into the same error.
class Person:
age = 5
name = 'Mark'def birthday(self):
self.age += 1
return self.agep = Person()
print(p.age)
print(p.birthday())# change of name
p.names = 'Lisa'
print(p.name)
See the bug? I made one of those singular-plural typos that I find I do occasionally and put names
instead of name
near the bottom. This code runs just fine. No problems at all, other than the logic error where I made a new member instead of changing an existing one.
For real fun though, note that you can accidentally alter fields in standard libraries too.
import mathprint(math.sqrt(9))
math.sqrt = lambda n: 2*n
print(math.sqrt(9))
The worst student experience I’ve seen with this wasn’t in Python, it was in JavaScript. That student wrote context.fillstyle = 'red'
in their code. They then spent hours trying to understand why nothing was being drawn. Print statements and the debugger indicated that everything was working properly. Eventually, they came to me to ask for help and I noticed that line. The student didn’t capitalize the “s”. It should be context.fillStyle
. This student wasn’t a novice. He was a Junior who had gone through many previous programming classes. It was frustrating for him. How much more frustrating would it be for a true beginner.
The key thing to note here is that every single one of these is a syntax error in statically typed languages. If you are using an IDE that highlights errors, every single one would get a red indicator in addition to a description and probably take the student a few seconds to fix. Even if the message weren’t that informative, it would at least be in the right place and after they had seen it once or twice they would know how to fix it. Typos that cause logic errors or runtime errors away from the actual line where the error occurs always lead to significant struggle. The student can’t just learn to understand what a particular error message means.
Scoping Issues
My last example is going to be a little bit more complex, but the root cause is something that I’ve actually done in a program in JavaScript. Imagine that some student gets really interested in the game from above and wants to implement more complex rules. They found something about lambdas online, or maybe their instructor introduced the concept, so they wrote the following function.
import randomdef attack(numTimes, dieSides):
"""
Return a list of functions for attacks. Max roll takes 10% of
health. Other values do that much damage.
"""
damageFuncs = []
for i in range(1, numTimes):
d = random.randint(1, dieSides)
if d == dieSides:
damageFuncs.append(lambda hits: max(d, hits / 10))
else:
damageFuncs.append(lambda hits: d)
return damageFuncsdamage = attack(5, 6)
hp = 200
for d in damage:
hp -= d(hp)
print(hp)
If you run this, you will see that every function does the same amount of damage except it can mix doing 10% damage occasionally with that one value. What’s going on here? It turns out that there is only one d
and all the lambdas close over it. So whatever the value of d
is on the last iteration is the one that is used all the time. It is able to occasionally do a 10% hit though because that is a different lambda.
This might seem far-fetched, but it is exactly the type of thing you will write for event handling code. I discovered this type of error in JavaScript making multiple event handlers in a loop. It was something like the buttons on a calculator. You can generally get around this in JavaScript these days with appropriate use of let
or const
. In Python, it is a much bigger challenge.
Cognitive Load (later edit)
A former student of mine read this post and related a story that I think warrants another section because it makes perfect sense. This former student was working with a friend who wanted to learn to program. The friend was interested in web development so the obvious starting language was JavaScript. What my former student observed was that his friend struggled with keeping all the type information in his head. It is challenging enough for a novice to keep syntax in their head, but needing to remember the types of everything is a significant additional cognitive load. JavaScript is particularly bad about this once the learner hit the point where he was using objects as you have to remember the names of all members that you want to use because that information isn’t readily available.
As it happens, after watching his friend struggle so much, my former student suggested that the friend switch over to C#. The two languages have many syntactic similarities. That, combined with that static typing and better IDE help that comes with static typing, led to the friend moving much faster and struggling much less in C# than he had in JavaScript.
I had never really considered the cognitive load aspects of dynamically typed languages for novices, but it makes sense. In some ways, the error that I described above where my student put fillstyle
instead of fillStyle
could be viewed as a cognitive load issue and not just a typo. The student didn’t remember that the “s” needed to be capitalized and because of the dynamic nature of the language, the tools provided no help with this. Even I had to go look it up when I was looking at his code to double-check.
Conclusions
No language is perfect. Every language has its flaws. Just because a language is easy for seasoned developers to pick up, that doesn’t mean that it benefits the novice. It simply means that the language has a logical flow that the experienced developer understands. Similarly, just because a language challenges a lot of experienced developers doesn’t mean that it is problematic for the novice. For example, experienced developers generally find switching paradigms to be very challenging because it differs from what they know. The novice knows little to nothing and hence doesn’t have that baggage.
I think that people commonly make the mistake of believing that dynamically typed scripting languages are easy to pick up. My purpose in writing this is to show some of the ways in which those languages actually can lead to more struggle for the novice. Yes, it is true that syntax errors can often have messages that aren’t all that meaningful to the novice. However, they almost always point to the right place, and once their meaning has been explained by a teacher, the novice can deal with them quickly in the future. The errors I describe here don’t have that property. Every time they occur, they can become a hunt to figure out where the actual mistake was made. This time spent hunting down errors can be frustrating and inevitably takes their time away from the concepts that they are supposed to be learning.
For anyone who might not be clear about the error messages that statically-typed languages would produce in these examples, I wrote a second post (https://drmarkclewis.medium.com/error-comparison-python-3-vs-scala-3-dcc99aee1f3c) comparing these Python programs to equivalent ones written in Scala.