CodeGen_MySQL_UDF is a tool that can automatically create the basic framework for MySQL User Defined Functions (UDF) from a rather simple XML specification file. The actual functionality is provided by the script udf-gen that is installed by the CodeGen_MySQL_UDF package.
The code generated by udf-gen is designed to work with MySQL versions 3.23, 4.0, 4.1 and 5.0 without modifications (although it is usually necessary to compile it for a specific server version).
udf-gen tries to support as many UDF writing aspects as possible. This currently includes code generation for simple and aggregate functions and preparation of configure and Makefile related build files.
CodeGen_MySQL_UDF is available in PEAR, the PHP Extension and Application Repository, on http://pear.php.net/.
Online installation using the PEAR installer is the easiest way to install CodeGen_MySQL_UDF, just issue the following command:
pear install -o CodeGen_MySQL_UDF
The PEAR installer will download and install the package itself and all packages that it depends on.
When installing from package files downloaded from pear.php.net you have to resolve dependencies yourself. Currently CodeGen_PECL depends on two other PEAR packages: Console_Getopt, which is part of the PEAR base installation, CodeGen, the code generator base package, and CodeGen_MySQL, the base package for MySQL specific code generators. You need to download all three packages for installation. The actual installation is once again performed by the PEAR installer:
pear install CogeGen-0.9.0.tgz pear install CogeGen_MySQL-0.9.0.tgz pear install CogeGen_MySQL-UDF-0.9.0.tgz
You can also install CodeGen_MySQL_UDF snapshots from PEAR CVS. CVS snapshots may include features not yet available in any release package, but the code in CVS may not be as well tested as the release packages (or even broken at times). Be warned, your milage may vary. Use the following sequence of commands in your PEAR CVS checkout to install the latest CodeGen_MySQL_UDF snapshot:
cd pear cd CodeGen cvs update pear install -f package.xml cd .. cd CodeGen_MySQL cvs update pear install -f package.xml cd .. cd CodeGen_MySQL_UDF cvs update pear install -f package.xml cd ..
Given that you already have written your XML specs file invoking udf-gen is as simple as:
udf-gen specfile.xml
udf-gen will parse the specs file, create a new subdirectory and puts all generated files in there. The generated code is ready to be compiled using the usual
configure; makesequence.
Below you find a hardcopy of udf-gen --help output:
udf-gen 0.9.1dev,Copyright (c) 2003-2005 Hartmut Holzgraefe Usage: /usr/local/bin/udf-gen [-hxf] [-d dir] [--version] specfile.xml -h|--help this message -x|--experimental enable experimental features -d|--dir output directory -f|--force overwrite existing files/directories -l|--lint check syntax only, don't create output --version show version info
The top level container tag describing an extension is the <extension> tag. The name of the extension is given in the name=... attribute. The extension name has to be a valid file name as it is used the extensions directory name.
You can specify which CodeGen_MySQL_UDF version your specification file was build for using the version=... attribute. The udf-gen command will not accept specifications written for a newer version of CodeGen_MySQL_UDF than the one installed. If the requested version is older then the current one then udf-gen will try to fall back to the older versions behavior for features that have changed in incompatible ways.
![]() | So far no such changes have happened. |
The tags <summary> and <description> should be added at the very top of your extensions. The summary should be a short one-line description of the extension while the actually description can be as detailed as you like.
The release information for your UDF extension should include the extension authors and maintainers, the version number, state and release date, the chosen license and maybe a change log describing previous releases.
The <maintainers>, <release> and <changelog> tags specifications are identical to those in the PEAR package.xml specification so please refer to the PEAR documentation here.
Example 2-2. Release information
... <maintainers> <maintainer> <user>hholzgra</user> <name>Hartmut Holzgraefe</name> <email>hartmut@php.net</email> <role>lead</role> </maintainer> </maintainers> <release> <version>1.0</version> <date>2002-07-09</date> <state>stable</state> <notes> The sample extension is now stable </notes> </release> <changelog> <release> <version>0.5</version> <date>2002-07-05</date> <state>beta</state> <notes>First beta version</notes> <release> <release> <version>0.1</version> <date>2002-07-01</date> <state>alpha</state> <notes>First alpha version</notes> <release> </changelog> ...
The <license> tag is a little more restrictive as its package.xml counterpart as it is used to decide which license text should actually be written to the LICENSE. For now you have to specify either GPL, LGPL or BSD, any other value is taken as 'unknown'.
Two different kinds of functions may be defined using the <function> tag: regular and aggregate functions. The function type is determined by the type=... attribute and defaults to regular.
The function name is defined using the name=... attribute and has to be a valid C function name.
The return type of a function is defined using returns=..., possible values are string (default), int, and real. For a function that may return NULL values the null='yes' attribute has to be set.
The length=... attribute can be used to define the max. length for a string result or the number of sigificant digits for int and real. For real results the number of significant decimals can be defined using the decimal=... attribute.
Function parameters are defined using the <param> tag. The parameter name is defined by the name=... attribute and has to be a valid C variable name. The parameter type can be one of string, int or real and is defined using the type=... attribute.
Optional paramters may be specified using the optional='yes' attribute. Once a parameter has been declared optional all following parameters have to be optional, too. Default values for optional values can be given using the default=... attribute. If no default value is given an unset optional parameter defaults to NULL.
For each parameter set of C variables starting with the same name is generated that can be used within the functions code snippet. The actual variable names and types depend on the parameter type.
For each parameter a variable by the name of the parameter is created, the variable is of type char * for string parameters, for int it is of type long and for real parameters a double variable is created.
For int and double parameters an additional variable name_is_null is created and set to 0 or 1 accordingly. For string parameters no such variable is needed, here the parameter variable is set to NULL instead.
It is possible to define an associated data structure for a function that can store data to be shared across all calls to this function during the execution of a statement. This data structure can be used to manage allocated buffers across all calls or to store the intermediate data while processing an aggregate group.
The elements of such a data structure are defined using <element> tags within a <data> section. Each element needs to be given a valid C name, type and default value using the name=..., type=... and default=... attributes.
Within the functions code sippets the private data can be accessed using the data pointer that is created and initialized in the generated wrapper code.
A regular function is initialized once for each SQL query it is used in. Before the actual execution starts the function is initialized by calling the functions init handle and after execution the functions deinit handle is called to clean up. During the execution phase the actual function handle is called for every result row.
Code for the inid, deinit and execution phase can be added to a function using the <init>, <deinit> and <code> tags.
Code in <init> is usually used to allocate and initalize private data. Parameter count and type checking and allocation of the private data structure is handled by the generated code already so the <init> code doesn't have to take care of this anymore.
The <deinit> code section usually only has to take care of freeing any resources held by elements of the private data structure.
The actual <code> section is supposed to perform the actual functionality of the function by processing its parameters and returning a result value.
The following macros may be used to return results from the <code> section:
RETURN_STRING(str) returns a string, the string length is calculated using strlen()
RETURN_STRINGL(str, len) returns a string of a given length, this saves a call to strlen()
if the string length is already known and allows to return binary strings that contaion the \0 character
RETURN_INT(val) returns an int value
RETURN_REAL(val) returns a real value
RETURN_NULL() returns a NULL result
The calling sequence of the different handlers of an aggregate function is a little more complicated than for a regular function. Both share the <init> and <deinit> handlers that are called before and after executing the actual SQL statement. Two additional handlers <start> and <add> are called at the beginning of each new group in the result and for each row in that group. The <result> handler is called after the last row of each group has been processed and is supposed to return the aggregated result for the group.
Custom code may be added to your extension source files using the <code> tags. The role=... and position=... tags specify the actual place in there generated source files where your code should be inserted.
Possible roles are 'code' (default) for the generated C or C++ code file and 'header' header file. Possible positions are 'top' and 'bottom' (default) for insertion near the beginning or end of the generated file.
Libraries, header files and additonal files that an UDF relies on may be defined in the <deps> section. So far only the addition of files using the <file> tag works, depending on the file type it is either compiled into the UDF or just copied to the UDF directory. Currently C (.c, C++ (.c++, .cpp, .cxx), , lex/flexx (.l, .lex, .flex) and yacc/bison (.y, .yacc, .bison) source files can be added to be compiled in, files with other extensions are just copied.
Additional configure checks can be added to the generated config.m4 file used by Unix/Cygwin builds using the <configm4> tag. Using the 'position' attribute it is possible to specify whether the additional code is to be added at the top or bottom of the config.m4 file.
Makefile rules may be added using the <makefile> for Unix/Cygwin builds. Using this it is possible to add dependencies or build rules in addition to the default and auto generated rules.
The XML parser used by CodeGen_PECL supports inclusion of additional source files using three different ways:
external entities
a subset of XInclude
the source attribute of <code> tags
In SGML and early XML system entites were the only include mechanism availabe. System entities have to be defined in the documents DOCTYPE header, later on in the document the entity can be used to include the specified file:
The CodeGen XML parser supports a simple subset of XInclude, it is possible to include additional specification files using the href=... attribute of the <include> tag:
Example 3-2. XInclude
<extension name="foobar" xmlns:xi="http://www.w3.org/2001/XInclude"> ... <xi:include href="foobar_2.xml"/> ... </extension>
The parse=... attirbute is also supported, using <include parse='text' href='...'/> it is possible to include arbitrary data without parsing it as XML.
Example 3-3. Verbatim XInclude
<extension name="foobar" xmlns:xi="http://www.w3.org/2001/XInclude"> ... <description><xi:include href="README" parse="text"/></description> ... </extension>
Other <include> features and the <fallback> are not supported yet, and most of them won't make sense in this context anyway.
In most places the <code> tag supports loading of its content using its src=... attribute:
C code usually contains quite a few >, < and & characters all of which need to be escaped in XML. This can be done by either converting them into entities all over the place or by embedding the code into CDATA sections:
Typing <![CDATA[ can become rather annoying over time (esp. on a german keyboard), so i introduced the <?data processing instruction into the CodeGen XML parser as an alternative to CDATA:
The transformation of a XMP specification file into an UDF directory is done by simply calling the udf-gen command with the XML filename as argument:
udf-gen specfile.xml
udf-gen will refuse to overwrite an existing UDF directory (as changes made in there may be lost) unless you call it with the -f or --force option:
udf-gen -f specfile.xml
You need to configure your UDF for your actual build system before compiling it. udf-gen has already created the necessary autotools input files and run autoconf and friends on them so that a configure file is already available.
You need to run configure to configure your UDF source for your system installation. Most of the time just running configure will be sufficient as appropriate defaults should be picked by the script.
If the mysql_config binary is noth in your $PATH or if you want to build the UDF for another MySQL installation than the default one you have to speficy the location of mylsq_config when calling configure:
configure --with-mysql=/path/to/bin/mysql_config
If your extension relies on external libraries installed in non-standard places you may want to run configure with the appropriate --with-... options.
You can find out about the --with-... (and other) options provided by a configure script by running:
configure --help
After configuring your UDF the actual compilation is done by the make command. No further parameters are needed at this point:
make
Currently no testing infrastructure is generated, it may be provided by a future version though. I startet to work on a mysql_udf PECL extension that allows PHP to load UDF libraries and to call the functions provided by it. This might be used together with the PHP test infrastructure to test UDFs, or maybe test cases for the MySQL test infrastructure could be generated instead. All this requires further investigation and work being done though.
There is no make install target yet as it is hard to automaticly find the right place to put the generated UDF .so libraries. Putting them into /usr/lib is a safe bet but usually you don't want to have them there ...