Categories


Archives


Recent Posts


Categories


PHP’s Safe-ish Types and the Return of Initialization

astorm

Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

Updated for Magento 2! No Frills Magento Layout is the only Magento front end book you'll ever need. Get your copy today!

In languages like C and C++, most teaching approaches will have you think about creating variables in two distinct phases. First, you’ll declare the variable.

int c;

and then you’ll initialize the variable with its default value

c = 42

When you declare the variable, you’ll telling the compiler what sort of value the variable is going to store. When you initialize the variable, you’ll telling the compiler what the initial/default value of that variable should be.

This is important in languages like C and C++ because these languages are biased towards thinking about variables as the contents of memory at a particular address (vs thinking about variables as the value they hold). If you’ve declared a variable without initializing it your program will return the value that happens to be at a particular memory address. The memory could hold anything, and is not deterministic between program runs. You’ll often hear these referred to as “garbage values” because this usually isn’t the behavior you want.

When next generation languages came along, (Java, C#, python, PHP, etc.) one of the problems they wanted to solve was this gap between declaration and initialization. Most languages of this era implement some form of automatic default value initialization. If an end user programmer doesn’t include an initial value for a variable, the program will use a value like null, None, undefined, etc.

While this behavior comes with its own set of problems (Call to a member function on a non-object, NullPointerException, etc.), the problems are at least consistent. Worrying about garbage memory isn’t something you’ll do often if you’re working in these languages.

PHP Typed Properties

The core PHP languages continues to add features from more strongly typed languages. A recent feature, introduced in PHP 7.4, is typed class properties. This allows you to add types to your class properties like this

<?php
declare(strict_types = 1);
class Baz {}
class Foo {
    public string $bar;
    public ?string $baz;
}
$object = new Foo;

// these two lines are OK
$object->bar = "Hello World";
$object->baz = "Hello World";

// these two lines are not OK, because
// an int or an object is not a string
$object->bar = 123;
$object->baz = new Baz;

// this is allowed, because the `?` in `string?`
// means "allow null values"
$object->baz = null;

// this is not allowed, because without the `?` a values of
// null is NOT allowed
$object->bar = null;

In the above program, the Foo object stored in $object has a property $bar and a property $baz. Because those properties were created with the string type

public string $bar;
public ?string $baz;

end user programmers will not be able to assign values to them that are not strings. The one exception to this is the string type that includes a ? at the begining. This is a “nullable” type, and means we’re allowed to assign a value of null to the property.

The ability to indicate whether null is allowed or not for a property is a feature that’s not available, by default, in many traditional object oriented languages. The advantage of a non-nullable type is you avoid a whole class of errors (Call to a member function on a non-object, NullPointerException, etc.).

However, like many new features, we’ve traded one problem for another.

Default Values for non-nullable Types

Pop quiz. If you have code that looks like this, what value should be dumped out?

<?php
class Zap {
}

class Zip {
    public Zap $zap;
}
$object = new Zip;
var_dump(
    $object->zap
);

Here we see the gap between declaration and initialization has returned. We’ve declared the $zap property, and told PHP that zap must contain an object whose type is Zap. If this were a traditional untyped PHP variable, PHP would automatically assign a value of null to $zip. However, the type system strictly forbids this.

In theory PHP core team members could have had the language attempt to automatically declare an “empty” Zap variable — but what if our Zap class looked like this?

class Zap {
    public function __construct(A $a, B $b, C $c, /* ... */) {
    }
}

Where should PHP get the values required by Zap‘s constructor? Expecting PHP to know how to instantiate an object of a particular type is asking a lot.

So how did PHP solve this? If you try running the program above, you’ll get an error that looks like this

PHP Fatal error:  Uncaught Error: Typed property Zip::$zap must not be accessed before
                  initialization in /private/tmp/test.php:10
Stack trace:
#0 {main}
  thrown in /private/tmp/test.php on line 10

That is — PHP will crash with a fatal error if you attempt to access the $zip property before it’s assigned — i.e. initialized with — a value.

PHP vs. Manually Compiled Languages

I first saw this behavior — a language refusing to allow access to an unitized value — in Rust. Rust has an additional advantage over PHP here, and that’s a manual compilation step. Rust will refuse to compile a program if you attempt to access an uninitialized variable. This means it’s impossible to deploy a rust program with code that accesses an uninitialized variable.

PHP, on the other hand, doesn’t have a compiler. This means that these checks are happening at runtime, which means these errors won’t surface until you actually deploy your program. It’s true that in the simple examples we’ve talked about these problems would surface during development — however in a non-trivial program it’s easy to introduce a bug that is data dependent and won’t surface until it’s been deployed.

In manually compiled languages, the type system prevents this class of bug. In non-manually compiled languages, the type system does not prevent this class of bug.

But wait! What about static analysis like phpstan or psalm? It’s true that static analysis can help — but then you’re reliant on a third party tool working, being configured correctly, and aligning with the syntax you and your team think is right.

Despite PHP 5.0 first adding type hints to the language 17 years ago, PHP’s type safety is still better described as safe-ish. The end result is a language where PHP professionals are better equipped to deliver stable code, but where the hobbyist, casual user, or polyglot professional faces an increasingly steep learning curve and programs that are harder to reason about.

Copyright © Alan Storm 1975 – 2021 All Rights Reserved

Originally Posted: 21st June 2021