Static Analysis for PHP

Posted by on August 10, 2012

At Etsy we have three tiers of static analysis on our PHP code that run on every commit or runs periodically every hour. They form an important part of our continuous deployment pipeline along with one-button deploys, fast unit and functional tests, copious amounts of graphing, and a fantastic development environment to make sure code flows safely and securely to production.

Sanity and Style Checking

These checks eliminate basic problems and careless errors.

A obvious rule is that syntax errors never make it make into the source repository, let alone production. So “php -l” is run as pre-commit check on each changed file.

We don’t use PHP’s native templating where code and HTML is mixed in one file. Our PHP files are purely code, and output is rendered via another templating system. To make sure this works correctly, we check that there is no text before the initial <?php tag. Otherwise that text is sent to the client and prevents us from setting HTTP headers. Likewise we make sure that PHP tags are balanced and there is no trailing text as well.

There are also a number of basic coding style checks. Nothing particularly exotic but they help make the code readable and consistent across an ever growing engineering department. Most of these are implemented using CodeSniffer.

Formal Checks using HpHp

Facebook’s HipHop for PHP is a full reimplementation of the PHP/Apache stack including a compiler, a new PHP runtime, and a web server. To make your application run under it will require some serious surgery since it’s missing many modules you might depend on. However, it has a fantastic static analyzer which can be used independently. It does global analysis of your entire code base for:

and a few others. It’s doesn’t always track exactly the latest PHP versions, so you’ll have to whitelist some of the errors, but overall it’s been wildly successful with almost no false positives.

Why not do all this in using PHP’s built-in tokenizer token_get_all or CodeSniffer? HpHp can analyze 10,000 files in a few seconds, while the built-in function is a few orders of magnitude slower. Since it’s so fast, we can put the static analysis as a pre-commit hook that prevents bugs from even being checked in. Which it does almost every day.

Security Checks

We run ClamAV and antivirus check our source repository. Has it found anything yet? No (phew). But at 1GB/sec scan rate and the low low cost of free, there is no reason not do it. We aren’t worried so much about PHP code, but occasionally Word and PDF documents are put into the source repo. ClamAV also scans for URLs and matches them against the Google Safe Browsing database for malware and phish sites.

The other security checks are alerts and triggers for code reviews. Files are scanned for commonly misused or abused functions involving cryptography, random numbers, and process management. If new instances are found, a alert for a code-review is done. Many times the code is just fine , but sometimes adjustments are needed or the goal can be achieved without using crytopgrahy. Likewise we alert on functions that “take a password” such as ftp_login. We want to avoid passwords being checked into source control. Some files are sensitive enough that any change triggers an alert for full review.

A lot more detail can be found in the presentation Static Analysis for PHP first given at PHPDay, Verona, Italy on May 19, 2012. They put on a great conference and highly recommend it for next year.

Posted by on August 10, 2012
Category: engineering, infrastructure, security

8 Comments

Note also that `token_get_all` is just an interface to PHP’s tokenizer, not a parser. Facebook’s parser is *much* more intelligent than simple lexer.

Related (but not the same) tools from FB: https://github.com/facebook/pfff/wiki/Sgrep and https://github.com/facebook/pfff/wiki/Spatch

What do you do with HipHop, in your pre-commit hook, exactly? Just a compile and parse the warnings? Something more intelligent?

S

Quick tip : to compensate for HipHop lagging in PHP version, maybe you can run Codesniffer 1 ruleset which checks for compatibility with 5.3 and 5.4 : https://github.com/wimg/PHPCompatibility

Apologies that this comment is not directly relevant to the post (I didn’t see any other obvious place to leave a general comment)

I just wanted to really thank you guys for putting this blog out.

I think it’s extremely generous of you to take the time to put this information out there and the topics you cover are hugely relevant (In fact, I was just wishing I had some better techniques to evaluate my own PHP code)

I am also hugely impressed at the quality of the prose, something that is sorely lacking in many blogs/web forums these days.

I am right now devouring the archives in the hope that I will have to be learning some of the same lessons of scale for my own projects on day.

Your blog has also motivated me to start my own small blog detailing some of the challenges, follies and (hopefully) successes as I go through my own product rollout.

Thanks!
Nick

Srsly?

“A obvious rule is that syntax errors never make it make into the source repository,”

A leetle joke perhaps? A test? =)

System programmers have “an obvious rule” that if you see spelling or syntax errors even in the comments in code, even in the documentation, you begin regarding that entire system with suspicion, and double down on your inspection of everything in the code.

Even simple non-executable errors imply no one with authority and concern had looked at that section. Also implied: review was sloppy or non-existent.

But then I suppose web programming is much more loosey-goosey in its approach and deployment.

Just tweaking you =) I’m a big fan and advocate of static code analysis =) Leave the typo in there, it makes it all so much more amusing =)

Feel free to delete this comment whether or not you fix the typo =)

[…] Static Analysis for PHP (Code as Craft) […]

[…] a recent post on the company’s blog, Etsy’s Nick Galbreath explained how his engineering team uses three tiers of static analysis when writing its PHP code – a process that can be applied to any development project. Coders use simple, ongoing static […]

Still search for a static analysis tool that will use php-doc tags for type-checking.

I did find this: https://github.com/scrutinizer-ci/php-analyzer/tree/legacy

But it doesn’t appear to be up to date, with most recent stubs for 5.4, and probably (?) no support for new 5.5-5.6 or 7.0 features. Not sure if there is a more recent version running on the actual Scrutinizer hosted service, or if they made it closed-source…

Php Storm does a pretty good job, but is still missing obvious checks like property-type checks. I’m getting tired of waiting and (scary thought) contemplating building it myself…