The lead developer at Mosaic, Brighton with a passion for web application development and motorcycles.
External Link: PHP extension writing: PHP Extensions Made Eldrich
Since writing my 15 Excellent Resources for PHP Extension Development post in September last year Kristina Chodorow of 10gen (MongoDB) has written an excellent four part article on writing PHP Extensions on her blog Snail in a Turtleneck.
Recently (well in a loose sense anyway) I had the need to build a document bank in PHP for a client at Mosaic. It was a fairly involved application with various public and private APIs for integration into the clients network of websites.
The core PHP code was written on top of the Agavi framework and various PHP libraries for extracting text and meta data from documents. One of the major features the client required was for the system to detect similar files to prevent unintentional duplicates making it into the document bank.
The idea was that this document bank would be the one central resource for all of the documents written and managed by the organisation. Duplicates or near duplicates would of course make this a pointless exercise. So I turned to StackOverflow for some pointers, but came up empty.
After some research and much searching of the web I came across an open source package called ssdeep written by Jesse Kornblum. I found it through reading his research papers; Identifying almost identical files using context triggered piecewise hashing.
Installing via the pecl command can be a pain on Redhat. First off all you will need to install the php-devel package:
yum install php-devel
Then you will need ensure that the PEAR/PECL installer is at the latest version so as root run:
pear update-channel pear.php.net
pear upgrade pear
You may need to force pear to upgrade itself by using:
pear upgrade —force pear
I had to use the —force option because my version of PEAR was so old that the installer thought my version of Tar_Archive might not have been up to muster. It was however.
With all this in place you are ready to attempt to install your chosen extension:
pecl install ssdeep
If you get something like the following back:
/usr/bin/phpize: /tmp/ssdeep/build/shtool: /bin/sh: bad interpreter: Permission
Then it is likely that your temporary directory is mounted in a safer noexec state, which means that you cannot execute scripts within the /tmp directory. To test this you can put a simple bash script into your /tmp directory and chmod it with +x. I used the following bash script:
#!/bin/bash
echo “SIMON”
If you do not get SIMON back when you execute the file, but an error like “/bin/sh: bad interpreter: Permission” then the directory is set to noexec.
There are a few ways to overcome this with the easiest being to:
mount -o remount,exec /tmp
pecl install ssdeep
mount -o remount,noexec /tmp
There are a variety of other ways to get this working documented in the Media Temple wiki pages if the above technique does not work for you.
Whilst developing a PHP extension recently I spent quite a bit of time researching exactly how to create an extension, the best practices and the DocBook format of the PHP manual for documenting the extension.
By the time I finished writing the extension I had found some very good resources both on the web and in print.
Online Articles or Websites:
Printed Books:
Information on Documenting PHP and PECL Extensions:
Also worth a mention is the ability of the new PEAR2 Pyrus installer to generate a skeleton for you to build your PECL extension upon - see the generate-ext section of the manual.
Pyrus can also be used to package your extension up for release using the pickle command.
For more help there is a PECL mailing list and IRC channel (#php.pecl on efnet), both of which are in the support section of the PECL website.
Whilst not strictly for PHP extensions the Autotools: a practitioner’s guide to Autoconf, Automake and Libtool book by John Calcote on the Free Software Magazine is very useful reading for an understanding of the build process involved in extension writing.
There is also the GNU Manual for Autoconf and GNU Autoconf, Automake, and Libtool by Gary V. Vaughan et al.
External Link: The PHP ssdeep Extension is Now in PECL
This means you can now install it easily by simply running:
sudo pecl install ssdeep
There is also proper documentation in the PHP manual which can be found at php.net/ssdeep.
For more information on the extension either see the PECL project page or the ssdeep PHP/PECL extension’s homepage.
External Link: php_ssdeep Fuzzy Hashing PHP Extension
Updated 16/9: php_ssdeep is now in PECL so I have updated this post to reflect that.
On a recent project I needed a fast way to compare documents for likeness and return a percentage match. With much research and one unanswered Stackoverflow post later I came across Jesse Kornblum’s ssdeep utility intended for computer forensics such as looking for signatures in files when hunting rootkits etc. All the technical details of fuzzy hashing are described in his 2006 journal article Identifying almost identical files using context triggered piecewise hashing.
I was using a wrapper around the ssdeep binary written in pure PHP, but I recently decided to write my first PHP extension by tying into the fuzzy hashing API ssdeep provides. The result is now a PECL extension or module and the BSD licensed source code is hosted on PHP.net’s SVN.
There are full instructions provided in the PHP manual and the php_ssdeep site to install the exension and use the provided functions. If you end up using this extension or its code I would be very interested to find out more about your project.