The Beginner's Guide to Type Coercion: Data Types

For anyone who has been programming for an extensive amount of time, you've more than likely used a number of different programming languages. Given the software landscape today, it's also likely that you've worked with both strongly typed languages and weakly typed languages.

That is, you've worked in programming languages that require you specify the data type of your variables, what your functions will return, and so on, and you've worked in programming languages that don't require that you explicitly set that information.

If you know exactly what I'm talking about, then this series articles probably isn't for you; however, if you're just getting into programming or you're just starting to use a new language that is dynamically typed (or weakly typed), then there are a number of things worth noting as it relates to working with data types.

In this series, we're going to take a beginner's look at dynamic languages, how variables are defined, how their data types are inferred and are different from their statically counterparts, and how to avoid some of the major pitfalls that come with working with these languages.

Understanding Data Types

Before we actually look at the pitfalls of type coercion and we're you're most likely to experience their pitfalls, it's important to understand what data types are and how they vary from strongly typed languages and dynamically typed languages.

Strongly Typed Languages

Generally speaking, you're most likely to find strongly typed languages in the family of programming languages that are compiled. That includes languages like C and C++.

However, there are exceptions.

There are some languages that are compiled to some form of bytecode or some type of intermediate language and then are processed by an interpreter. Java is one such language. These languages are strongly typed. They are also compiled. But they aren't necessarily compiled into binary executables but bytecode that's interpreted by a third-party piece of software.

I know it sounds a little bit confusing, so perhaps some code will help to clarify this. In strongly typed languages, you always declare the type of data a variable is going to represent. 

For example:

Though the code should be simple enough to be self-explanatory, notice that it shows variables that hold strings, numerical types, and boolean values.

In strongly typed languages, you must also denote the type of information a function will return. Take the following examples:

Note that in the example above, the final function has a return type of void. This means that the function doesn't return anything. When we begin looking at dynamic languages, we'll see how this differs.

Obviously, these are extremely simple, but that's okay as they're to make a point: That strongly typed languages have variables and functions that have their data type explicitly set.

Dynamically Typed Languages

When it comes to dynamically typed languages, there are a number of luxuries that come with being able to define variables and to create functions.

In the previous examples, the variable example can only hold a string. That means that it cannot hold a floating point number or a boolean value - it must hold a string. In dynamically typed languages, that isn't the case.

Instead, variables may refer to a string at one point during the program's lifetime, an integer at another point, and a boolean value at another point. Of course, this can get confusing if some type of coding standards aren't adopted, but that's beyond the scope of this point.

The point is, variables defined in dynamically typed languages can refer to different types of data through a program's execution.

For example:

Note that these variables don't have a data type - they're simply declared as type var and then are set as needed. Some languages define variables differently than what you see above, but the point is not to show how one language does it over the other. It's to show how variables simply don't refer to a specific type of data.

Functions work in a similar fashion. That is, rather than defining the return type of the data, you simply define the function and have it return a value.

Again, different languages will require functions to be defined in different ways (for example, some languages don't use the function keyword but instead use the def keyword), but the gist is that you don't have to explicitly state the type of data that will be returned from the function.

This can be a really powerful tool; however, it can also make it difficult to read the code, difficult to understand what the function will return, and/or difficult to know how to setup external code to call the functions (such as in the case of comparisons, conditionals, and so on).

This is why coding standards and clear naming conventions are important. Again, though, that's a topic that's outside the scope of this series.

Coming Up Next...

Now that we've taken a cursory look at how strongly typed languages and dynamically typed languages manage variables, functions, and data types in general, we can turn our attention to how type coercion works within the larger context of applications that are written in dynamically typed languages.

Specifically, we look at how we can use the performance to our advantage, and we can look at how we may end up introducing bugs by not being explicitly clear in our code. So starting in the next article, we're going to do exactly that.

In the meantime, please add all comments, questions, and general feedback to the feed below!

Tags:

Comments

Related Articles