Advice To New PHP Developers From a Slowly Recovering Horrible Programmer

7/8/18 - Clifford Vickrey

Jacob Kaplan-Moss, the creator of the Django Python framework, gave a 2015 talk on a growing crisis in the software development industry. There aren't enough developers, and there's a widespread cultural belief that you have to be a "real" programmer in order to become one. Since being a "real" programmer means knowing how to write compilers, self-driving cars, sentient chatbots, and rocket controllers, it is naturally an expert skill beyond the grasp of such mortals. And not only does the skill demand such a hypertrophy of left-brained calculus muscle, but it's also A-R-T in the same category as painting, sculpture, and poetry. Don't even bother trying, in other words.

Instead, he correctly argues, programming is like any skill: most of its applications are mundane (like saving form data to a database), and proficiency in it is normally, not bimodally distributed. If the industry wants to safeguard its long-term health, it had better be more welcoming to beginners and dabblers from other fields, and rid itself of the stereotype of the toxic "rock star" tech genius to whom society must genuflect in silent admiration.

I was in the beginner boat he described. When in graduate school, I quickly felt an idealistic twenty-something's clichéd disillusionment with academia (I was frustrated that I wasn't on the cusp of changing the world with the power of ideas, man), and was looking to try something new, even as a hobby. Web development interested me. At the same time, through a bizarre coincidence, I was tasked to build an interactive social scientific web application, an endeavor for which I had a dearth of qualifications but a surplus of ambition.

I wasn't sure where to start. I knew HTML, a little JavaScript/jQuery, and some SQL. I had never really written server-side code. Likely because of its very short time to "Hello, world!" factor, and because I recognized some function names from the smatterings of C I learned in high school, I chose to learn PHP.

But how? As recently as eight years ago, the language had neither a formal specification nor even accepted best practices. The closest thing PHP had to a dependency management system for bringing in third-party code was (shudders) PEAR. There was certainly no "Zen of PHP." Worse still, the tutorials available online were objectively awful, as well as absurdly inviting of security flaws. Need to query a database with user input? Just strip its of quotes, tack it onto a MySQL SELECT statement, and call it a day. Want a cool way to work with structured user data? Just unserialize GET parameters into objects. (Remote code execution? What's that?) Want to let a user upload a file? Who cares what the extension or MIME type is, just save it into /var/www/website/public/files. The guides on W3Schools were so bad that the World Wide Web Consortium politely asked the website to publicly disavow any relationship with the W3C. (It didn't).

With a head full of ill-conceived ideas, pop cultural references, and little else, I set to work building the app in Windows Notepad. It worked. It got great reviews. The New York Times linked to it. It even won an award from my academic discipline's premier professional association. It was so good, in fact, that it may have invited the attention of a third party that ultimately led to the site's amicably negotiated demise about which I am legally forbidden to discuss.

The app, under the hood, was also … something. Load the source code in a modern IDE and prepare yourself for warnings with more red and yellow than McDonalds' global branding effort.

Alan Alda Mode, engaged
I think it's a Futurama reference. Maybe if it's set to true, the app is running in production, because production is "serious," as when M*A*S*H preached about the horrors of war?
The Constant Gardner constant
It took me 5 minutes of staring into the fridge before it I got the pun, arguably the worst in history, that explains why there's a constant named GARDENER. I think I was on a John le Carré kick at the time.
The beloved arrject
The beloved Arrject.
Globally awesome
That's one way to manage application state, I suppose.
Dynamic variables
There are 50 variables in this scope that are dynamically set by extract() and $$variable_variables. That's good, right?
Hope you like ternary operators
Hope you like ternary operators, because there are ternary operators within ternary if statements.
Hawkeye

What this goes to show is that even terrible code can prove enormously useful and valuable (and in this case, someone independently valued software containing ALAN_ALDA_MODE constants to be worth half a million dollars).

Nonetheless, I was frustrated with and tired of PHP after this effort, and during my time in grad school focused my programming efforts on R and Python, convinced of the orthodoxy that these were "real" languages, and stuck to simple scripts, convinced I wasn't a "real" programmer.

As fate would have it, in the past few years I have found myself happily working professionally as a software developer. I learned that PHP 7 is now quite different from PHP as I remembered it, and is not synonymous with the horrible hacks I glued together until everything "worked." It has undergone something of a renaissance to clean up its act, codify best practices, and offer new ways of writing reusable, interoperable code. It is now, unquestionably, a viable choice for the rapid protyping and deployment of green field web applications, and not just something people are forced to use as legacy code maintainers. If I were starting out again, I think I'd have had a far easier time. I'd have at least wrapped the Constant GARDENER in a class.

In the sprit of constructive reminiscence, I posed myself the question: what advice would I give to my younger self, were he starting out? After handing him Gray's Sports Almanac, I would say the following:

  • If you want to learn PHP on the web, learn it from reputable sources. PHP: The Right Way is a good primer on best practices, but requires a bit of prior knowledge to understand. (You have to know how to program before you know what a "programming paradigm" is). Laracasts has a good primer to help someone go from "sort of knows what HTML is" to writing a model-view-controller application from start to finish.
  • Learning a programming language's toolchain and community is harder than learning the language itself, and should be an immediate priority. To give an expample: JavaScript is a simple enough language. Actually using it entails wading into a nightmare of package managers, wrappers for package managers, syntactic supersets like TypeScript, transpilers, task runners, hundreds of plugins for task runners, linters, and a choice of dozens of front-end and back-end frameworks with teenagers on high sugar intakes inventing more every week. Luckily, PHP's tooling is much simpler on account of fewer options: almost everyone in the community moved to PhpStorm as an IDE; Composer is now the dependency bundler that everyone uses; there are about three popular debuggers, with Xdebug (a C extension) being the most popular; the most popular test frameworks are PHPUnit (for unit testing) and Codeception (for automated full-stack acceptance testing); a couple of static code analysis tools like PHAN if you're into that sort of thing; and the two popular application frameworks, Symfony and Laravel, have their own command-line tooling. PHP has the benefit of many free, open-source libraries on Packagist, which helpfully provides metrics on their popularity/mindshare to help separate the wheat from the chaff. Coming from CRAN (R) and PyPI (Python), I find Composer/Packagist to be a revelation.
  • Ignore the advice contained in articles like this and this, at least if you're just starting out. Optimization at the micro level, at best, furnishes performance improvements on the order of a few milliseconds every 10,000 requests. And even if you achieve this with arduous levels of effort, later implementations of the language might negate any benefit of micro-optimization, such that your "improvements" in the future actually slow things down. (More to the point: if associative array parsing is the performance bottleneck of your application, you're doing something wrong in the first place). There is also the fact that such optimizations can work at cross-purposes with sane, readable code. If you rewrite every class method to be static because the interpreter resolves their calls slightly faster, and refactor every single array in your code to be objects just to slightly optimize hash table reuse at the C level, you're putting yourself in a world of hurt. If you do have performance issues, there are free and commercial profiler tools available to help you target specific performance problems rather than guessing.
  • Learn object-oriented patterns, particularly by reading the code of reputable dependencies in your project. For an overview of design patterns in PHP and examples of their use, check out the DesignPatternsPHP repo. Don't overdo it, though. Rethink things slightly if your codebase starts looking like this (my favorite project on GitHub, by the way).
  • Writing tests gives you the invaluable luxury of introducing changes without panicking about breaking things. So do it! The main benefit of object-oriented patterns is that they engender code that's easier to test, one class or module at a time. If you're not writing tests, there's almost no point in using patterns like dependency injection (which lets you mock up a class's composition for testing) and strategies (which lets you test algorithms independently of their invocations inside other classes).
  • Adhere to the PSR-2 coding standard. If you're lazy like I am, write code in whatever format you like, then press Ctrl + Alt + L in PhpStorm and you're done!
  • Don't participate in the framework wars on the Internet. The question of which framework is "best" has an outsized role in the PHP world. True: all but the simplest web applications need a "framework." There are two things to qualify this statement, though. One is that the definition of "framework" is changing from "monolithic library that essentially replaces the language for you" (a necessary evil in the benighted days of CodeIgnitor, when the language was in a comparatively primitive state) to "a bunch of components you glue together with a router and dependency injection container." The second is that, with the community's newfound emphasis on loosely-coupled packages as well as the advent of PHP Standards Recommendations (PSR), which furnish interfaces for interoperable code, the choice of framework is arguably less momentous than it once was. With good design, an application can swap in and swap out libraries from multiple frameworks as required. The upshot: there is no need to become technically or emotionally wedded to a single framework. The endless blog posts and forum threads casting Laravel as either the savior or bane of PHP development are senseless, since A) there are going to be use cases where it shines (medium-complexity apps rapidly developed by agencies) and others where it may not (high-complexity web applications with tons of domain-specific requirements), and B) framework choice is not likely to be the decisive factor in a project's success. Instead, familiarize yourself with the language, and then make a choice based on personal preference and project needs.

    Want to write nice apps with as little third-party overhead as possible? Look into Slim. (If Laravel is Django/Rails, Slim is Flask/Sinatra for Python and Ruby, respectively).

    (As an aside, it'd be a shame if PHP went the way of Ruby: an excellent language that became synonymous with one particular framework that, for all its virtues, is falling out of fashion and threatening the drag the language down into oblivion with it).
  • Especially don't participate in the language wars on the Internet, on behalf of PHP or any other language. As alluded to earlier, the Internet frequently casts PHP and JavaScript developers, because of the ubiquity of their platforms, as "not real programmers" poised to ruin programming with their mere unpedigreed presences. Individual programmers are usually a kind, honest lot. They do not often fit the stereotype of the enfant terrible (that's French for "awful baby") who mainly uses his expertise to lambaste others. The same cannot unfortunately be said of some of the programming community on some corners of Reddit, Slashdot, and Y Combinator, which often combine comical levels of scorn for web development letzten Menschen with farcical fanboyism towards their preferred technologies and panjandrums. While I see stuff like this, I can only smile and imagine a worldwide community of people devoted to hating Black & Decker toasters and their purchasers. Leaving aside the pointlessness of self-identifying with tech brands, as well as basing judgments of people's worth based on their preferred tools, many (but not all) design criticisms of PHP are either A) valid only vis-à-vis ancient versions of the language that were actually substandard (<= 5.2), or B) are really critiques of bad procedural, ball-o'-mud code you see in aging codebases like WordPress and Moodle that no strongly-typed, compiled language would've stopped developers from writing. It's all not terribly different from people who've been saying "Java is SLOW" for the last twenty-five years.

    The moral: tech bigotry is a waste of time at best. It's destructive at worst, as when it encourages the ill-advised adoption of shinier, envied platforms with disastrous consequences. From my experience, there isn't that much to gain from migrating an application from PHP to Python, but the managers of one notable startup decided that Python 3 was "more powerful" than PHP on the apparent basis of old blog posts, left a Python debugging tool running on their production server, and exposed gigabytes of source code and user data.

    What of the valid criticisms? For the most part, the annoyances associated with the language are present, but not major sources of problems. To give one example: "weak typing" is alleged to be a reason the language is both insecure and unusable. While it's true that statically typed languages perform better because of their greater correspondence to machine code, A) weak comparisons are optional; B) if you want to use weak comparisons when they come in handy (e.g. to test if a variable's "falsey," it's easier to type empty($x) instead of !isset($x) || $x === 0 || $x === 0.0 || $x === '0' || $x === false || $x === []), the comparison table isn't all that hard to memorize, and C) typing bugs almost never appear in competent PHP code. This is especially the case since PHP 7.x's type hints and strict_types execution directive eliminate the problem of type coercion that's unintended.

    To give another example: the standard library is alleged to be worthless. Most functions have snake-cased names (array_key_exists) and are easy enough to read. Others, especially the string functions copied from C's standard library, are abominations: strncasecmp gives you "binary-safe case insensitive string comparisons up to N digits." Function signature consistency is another issue; the most commonly cited example is a side-by-side comparison of strpos($haystack, $needle) and array_search($needle, $haystack). (As an aside: the reason for some of this weirdness is technical and historical). Even worse from a purity standpoint is that (unlike in Python) all functions and SPL classes/interfaces are in the global namespace and implicitly available in every script.

    From the perspective of practice as opposed to artistry, however, the standard library is pretty complete: it gives you all you need for basic, repetitive tasks, and has neat stuff that's not built into any other language (for instance, up-to-date password hashing). Working with JavaScript, which arguably lacks a standard library at all, gives you an appreciation for all PHP does: installing those most boilerplate JS packages involves downloading dozens of dependencies that achieve such 21st century marvels of functional programming engineering as … left-padding a string. The ugliness of PHP function names is only a minor nuisance in comparison. Modern IDEs tell you exactly what every function and function argument in the language specification does. (When in doubt, just press Ctrl + P in PhpStorm). At the end of the day, I don't find myself using Google any more or less than when Python or R were my languages of choice and formed my core competency. If preserving slightly aesthetically deficient code is price that the PHP community has to pay to avoid the drawn-out language schisms that good redesigns like Python 3 and Perl 6 begot, it is a bargain indeed.
  • At the same time, get a sense of where critics of PHP are coming from so that you can get an idea of what PHP's ideal use cases are and are not, and also gain perspectives that will help you develop in any language. Where I find PHP to be lacking is that is lacks certain object-oriented bells and whistles found in other languages (enums ["factors" in R] are one, the absence of which makes data validation a pain sometimes; generics are another, as without them you'll find youself constantly polluting your code with /** @var SomeObject[] $arrayOfSomeObjects **/ docblock tags to stave off IDE warnings). Another weakness of PHP is one of its strengths: the one-process-per-request model, which means that A) PHP does a cold start every time someone hits your website, B) doesn't share any memory with other PHP processes, so you have to persist data in files and databases before the end of the request; and C) shuts down when the request lifecycle ends, so that all the variables you declared and objects you spun up go right into the trash. On the one hand, this model is time-tested, makes it easy for developers to reason about the state of their code, and prevents applications from grinding to a halt if a single process (say) throws an unhandled exception. PHP 7.x running on a warm cache is also super-optimized at very low level, such that most applications perform just fine, with bottlenecks occurring in the data access layer. It does become a problem for heavily trafficked, file I/O intensive, web-servicey endpoints with high concurrency requirements. PHP would thus be a terrible tool with which to build (say) a stateless microservice in Netflix's stack. Async implementations of PHP are emerging to fulfill this need, but it's too early to say whether they'll catch on, or overcome the fact that user libraries usually don't take memory management into account.
  • Have fun, and don't compare youself as a developer to others. Instead, compare your skills to where they were six months ago. If you're willing to learn a little at a time, you'll find that comparison extremely heartening. When you stretch the comparison to six years, though, don't be surprised to find that your vintage code smells like Hawkeye Pierce's bathtub gin.