15 SWIG and Ocaml
This chapter describes SWIG's support of Ocaml. Ocaml
is a relatively recent addition to the ML family, and is a recent addition
to SWIG. It's the second compiled, typed language to be added. Ocaml has
widely acknowledged benefits for engineers, mostly derived from a sophistocated
type system, compile-time checking which eliminates several classes of common
programming errors, and good native performance. While all of this is wonderful,
there are well-written C and C++ libraries that Ocaml users will want to
take advantage of as part of their arsenal (such as SSL and gdbm), as well
as their own mature C and C++ code. SWIG allows this code to be used in
a natural, type-safe way with Ocaml, by providing the necessary, but repetetive
glue code which creates and uses Ocaml values to communicate with C and C++
code. In addition, SWIG also produces the needed Ocaml source that binds
types, variants, functions, class, etc.
15.1 Preliminaries
SWIG 1.3 works with Ocaml 3.04 and above. Given the choice, you should
use the latest stable release. The SWIG Ocaml module has been tested on
Linux (PPC,MIPS,Intel,Sparc) and Cygwin on Windows. The best way to determine
whether your system will work is to compile the examples and test-suite which
come with SWIG. You can do this by running make check from the
SWIG root directory after installing SWIG. The Ocaml module has been tested
using the system's dynamic linking (the usual -lxxx agains libxxx.so, but
not using the explicit dynamic linking provided by the Dl package http://www.ocaml-programming.de/packages/documentation/dl/
, although I suspect that it will work without a problem.
15.1.1 Running SWIG
The basics of getting a SWIG Ocaml module up and running can be seen
from one of SWIG's example Makefiles, but is also described here. To build
an Ocaml module, run SWIG using the -ocaml option. Enabling proxy
classes -objects is also used. In order to disable the non-object
interface, making methods only show up in the .mli as class methods, specify
-onlyobjects.
%swig -ocaml -objects example.i
This will produce 3 files. The file example_wrap.c contains
all of the C code needed to build an Ocaml module. To build the module,
you will compile the file example_wrap.c with ocamlc or
ocamlopt to create the needed .o file. You will need to compile
the resulting .ml and .mli files as well, and do the final link with -custom
(not needed for native link).
15.1.2 Additional Command Line Options
The following table lists the additional command line options available
for the Ocaml module. They can also be seen by using:
swig -ocaml -help
Ocaml specific options |
-mlout <ocaml-file.ml> |
Sets the name of the ocaml interface files to be generated. The .ml
extension must be present. |
-objects |
When on, produce ocaml class definitions for C/C++ classes, structs,
unions. |
-onlyobjects |
When on, produce only class methods for functions that appear in
as class methods or member accessors.
|
-classmod
|
Wrap classes in Ocaml modules. This is a way
to disambiguate scoped enums, classes, etc. by planting them inside modules.
It also makes code that closely follows the layout of most Ocaml libraries
as released.
|
-uncurried |
Wrap functions uncurried (with tuples). This was the
way the module was originally written, but it's not as efficient in
most cases. A case where it might be more efficient is the case of a
list of tuples that match the call signature of the target function
when this list is used with that function in List.map or List.iter.
|
15.1.3 Getting the right header files
You may need the libswigocaml.h file that comes with the distribution
to be included. It provides several useful functions that almost all programs
that use SWIG will need. It is located in $(prefix)/include/libswigocaml.h
where $(prefix) is usually /usr/local, but could be /usr. This is set at
configure time.
15.1.4 Compiling the code
Use ocamlc or ocamlopt to compile your SWIG interface
like:
% ocamlc -c -ccopt "-I/usr/include/foo -I/usr/local/include" example_wrap.c
% ocamlc -c example.mli
% ocamlc -c example.ml
ocamlc is aware of .c files and knows how to handle them. Unfortunately,
it does not know about .cxx, .cc, or .cpp files, so when SWIG is invoked
in C++ mode, you must:
% cp example_wrap.cxx example_wrap.cxx.c
% ocamlc -c ... -ccopt -xc++ example_wrap.cxx.c
% ...
15.1.5 Current thoughts on best practice for Ocaml
Because the VC compiler (cl) needs link options specified after all compiler
options, and ocamlc doesn't really understand that, I think that this is the
best way to link ocaml code with C++ code. I formulated this method to make
it easy for co-workers who rely on MSDev to create GUIs, etc.. to live in
harmony with the ocaml parts of the application.
Let's say you have ocaml sources foo.ml and bar.ml and interface frob.i;
swig -c++ -objects frob.i
ocamlc -custom -c frob.mli
ocamlc -custom -c frob.ml
cp frob_wrap.cxx frob_wrap.c
ocamlc -custom -c -I$(FROBLIB)/include frob_wrap.c
ocamlc -custom -c foo.ml
ocamlc -custom -c bar.ml
ocamlc -pack -o foobar.cmo foo.cmo bar.cmo frob.cmo
ocamlc -custom -output-obj -o foobar.obj foobar.cmo
At this point, foobar.obj can be included in your MSVC project and linked
against other code. This is how you link it:
link /OUT:big_program.exe \
other1.obj other2.obj foobar.obj frob_wrap.obj \
$(OCAMLLIB)/ocamlrun.lib $(FROBLIB)/lib/frob.lib
15.1.6 Using your module
You can test-drive your module by building a toplevel ocaml interpreter.
Consult the ocaml manual for details.
When linking any ocaml bytecode with your module, use the -custom option
to build your functions into the primitive list.
15.1.7 Compilation problems and compiling with C++
As mentioned above, .cxx files need special handling to be compile
with ocamlc. Other than that, C code that uses class
as a non-keyword, and C code that is too liberal with pointer types may not
compile under the C++ compiler. Most code meant to be compiled as C++ will
not have problems.
15.2 The low-level Ocaml/C interface
The SWIG Ocaml module is based upon the page in the Ocaml manual titled
"Interfacing
C with Objective Caml". You should familiarize yourself with this
information if you need to write any special typemaps.
15.2.1 The generated module
The SWIG %module directive specifies the name of the Ocaml
module to be generated. If you specified `%module example', then
your Ocaml code will be accessible in the module Example. The module name
is always capitalized as is the ocaml convention. Note that you must not
use any Ocaml keyword to name your module. Remember that the keywords are
not the same as the C++ ones.
15.2.2 Deleters
module gives the Ocaml is a garbage collected language. You can choose
to ignore this, and manage the C++ heap yourself, or, you can have Ocaml
manage certain objects. Since C++ code often requires objects to be
owned by different parties at different times, the SWIG Ocaml programmer
a choice at all times. Each module is built with a function set_delete_fn
: 'a -> string -> unit
which will use caml_named_value (values
registered with Callback.register) to get a function to use to delete the
contents of the cell. By default, all destructors are registered by
their wrapper name, so delete_foo becomes "_wrap_delete_foo". This
is typical usage:
let x = new_foo ()
let _ = set_delete_fn x "_wrap_delete_foo" (* Foo is garbage collected *)
let y = new_foo ()
let z = new_managing_container ()
let _ = Managing_container.add y
(* Y is not garbage collected because it wasn't set to
be deleted *)
let _ = set_delete_fn z "_wrap_delete_managing_container"
(* But z is garbage collected. It will delete y
*)
15.2.3 Types
The default typemaps are good generally, but have their weaknesses
as all C type conversions must. In general, it isn't possible to predict
the use that a C variable will be put to; since it's all just bytes in memory,
any C variable can be used to hold any C value at least as small, and sometimes,
even this is fudged. Also, pointer to object may mean pointer to array,
or pointer to a single thing of that type. Some degenerate libraries even
intermix enum and int freely, using enums as int constants, bit flags, or
other int values. In addition, char * sometimes means opaque buffer and
sometimes string. Given all of these factors, the following default type
handling was chosen given the author's experience with C++. YMMV.
C type |
Default Ocaml Type |
bool |
bool |
void |
unit |
int |
int |
short |
int |
long |
int64 |
unsigned long |
int64 |
char |
char |
char * |
string |
float |
float |
double |
float |
oc bool |
bool (* Can be used as a convenience in C code, typedef'd to
int *) |
unsigned int |
int32 |
unsigned short |
int |
unsigned char |
char |
long long |
int64 |
unisgned long long |
int64 |
When struct, class or union objects or references are used in function
calls, or as results, Ocaml code pretends that they are used as pointers.
This makes ocaml code easier to deal with both in terms of garbage collection,
and in terms of uniformity. Because of this, user code never needs to enreference
or dereference elements, although user code may need to cast pointer types,
or on occasion, allocate a pointer variable which C/C++ code can store a
value in. Functions are provided for this in libswigocaml, the
SWIG Ocaml support library. As far as casts, the user will either provide
an inline function that performs the cast, use the "%identity" primitive,
or use the Obj.magic function in the ocaml library. Note that Obj.magic
does no work except to pretend that the type of the argument is the same
as the type needed for the expression, therefore, it's possible to crash
the program this way (just as with a C cast).
In general, any C/C++ pointer type is represented by _p prepended, all
types are prepended with _, and some more exotic types are encoded with
different pseudo-symbols. You should check the .mli output to find the
types assigned to various functions.
15.2.4 Functions
C/C++ functions are mapped directly into Ocaml functions. Parameters
are passed tuppeled (enclosed in parenthesis, and separated by commas).
Names are sometimes changed in order to make them into correct ocaml names.
This usually involves adding an underscore in front of the name, but can
mean adding a number to the end to break a conflict. You should read the
.mli output before writing code based on SWIG output. Every possible effort
is made to handle namespace and class names in an intelligent way that preserves
the original name within the constraints of the ocaml system (ocaml functions
can't be overloaded in the C++ sense, and can't start with an upper case
letter, as well as needing to avoid the use of ocaml keywords).
15.2.5 Variable Linking
SWIG provides access to C/C++ global and member variables both as Ocaml
functions, and as methods where applicable. In general, a mutable (modifiable)
variable will have _get and _set methods like:
(* int foo; *)
val foo_get : unit -> int
val foo_set : int -> unit
and constants will have only a value binding, like:
(* const char *bar = "Yadda"; *)
val bar : string
since such a "variable" can't change and will never be set to anything
else.
Member variables are accesses in the obvious way through methods of their
containing classes.
15.2.6 Callbacks
The ocaml SWIG language module allows you to write callbacks that will
be called from your C code. Currently, this feature is experimental. Consider
the following code:
%module error
%{
void call_err( void (*errfunc)() ) {
errfunc();
}
%}
%feature("camlcb") caml_error {caml_error}
extern void caml_error();
void call_err( void (*errfunc)() );
This code will create a callback function called caml_error, and create
a function pointer constant that enables you to provide an ocaml function
to be called. If no ocaml function is provided using the given name, then
an exception will be thrown.
If this code were built into a toplevel, you could write:
Objective Caml version 3.04
# open Error ;;
# Callback.register "{caml_error}" (fun unit -> print_endline "hi"; flush stdout) ;;
- : unit = ()
# call_err caml_error ;;
hi
As you can see, this enables the C code to call your Ocaml code quite
transparently.
15.2.7 A word about message loops
If you use native threads and message loops that can call into ocaml, ocaml
code must originate any thread that can make a call back into the interpreter.
I'm not sure if there's a way to register a non-ocaml thread with the
interpreter as there is in the JNI. It can, however, be mitigated, by
queueing or signalling notifications that a call made from an ocaml thread
will retrieve.
15.2.8 Enums
SWIG will wrap enumerations as polymorphic variants in the output Ocaml
code. Each variant has an `Int variant which is a catchall allowing degenerate
C++ libraries mentioned above to work. Some functions which deal with
enums as bit sets are available for each enum type. For an enum type foo,
these are _foo_to_int, int_to_foo, foo_bits,
check_foo_bit and bits_foo. Each of these performs some
task transforming enum type values to integers, enum lists (representing
bit sets), and ints or bit sets to enums. check_foo_bit allows
the user to quickly check whether an enum value contains a superset of the
bits in some indicated enum value.
As far as naming goes, polymorphic variant labels are an exception because
they don't require any additional rules from C++, so they are simply prepended
with '`' in the ocaml style.
Example:
%module enum_test
enum c_enum_type { a = 1, b, c = 4, d = 8 };
enum_test.mli:
type _c_enum_type =
[ `int of int
| `a
| `b
| `c
| `d
]
|
(* 1) The enum declaration itself. Every enum
is a polymorphic variant in order to make life simple. This allows
every enum to share the `int label, which allows that enum to carry an arbitrary
int value. *)
|
external a_get : unit -> int = "_wrap_a_get"
...
|
(* This is a function which retrieves the actual
value of an enum label. *)
|
val a : _int ...
|
(* This is a convenience which holds the value of
a_get () since it never changes. *)
|
val c_enum_type_to_int : _c_enum_type ->
int
|
(* Given any _c_enum_type object, return the corresponding
int. This is useful when you want to encode an enum value as int and
when you must pass the enum value as an int parameter. *)
|
val int_to_c_enum_type : int -> _c_enum_type
|
(* Given any int, return a corresponding _c_enum_type
element. If the int does not match any single enum label from the
target enum type, then `int is returned containing the original value. *)
|
val c_enum_type_bits : _c_enum_type list ->
_c_enum_type
|
(* Given a list of _c_enum_type elements, construct
a _c_enum_type object with the logical or of them stored in it. This
is useful for cases where enum labels are used to denote different bits.
*)
|
val check_c_enum_type_bit : _c_enum_type ->
_c_enum_type -> bool
|
(* Given two enum elements, v and match,
return true if every 1 bit in match is set in v. Use
this to conveniently check single bits, or bit expressions. *)
|
val bits_c_enum_type : _c_enum_type ->
_c_enum_type_list -> _c_enum_type_list
|
(* Given a value of type _c_enum_type, and a list
of _c_enum_type values, return a list containing every element in the input
list for which check_c_enum_type_bit is true. Use this to decompose
an bitfield enum for use with a caml match .. with expression. *)
|
15.2.9 C++ Classes
C++ classes can currently be wrapped in three styles, selectable with
the -objects and -onlyobjects options. Objects are a fairly recent addition
to the ML language family, as modules and functors were typically used for
the same purposes as objects in the past. Since C++ is object oriented,
it is often convenient to pretend that C++ class pointers are real Ocaml
objects, and call their methods, etc, as though they were. Objects
in Ocaml have drawbacks, however. First; they are not compatible with
code that compiles under caml light. Second; they interact uniquely
with the Ocaml type system in a way which does not please everyone. Because
of this, one may access objects in three ways;
Consider this example class:
class cpp_base {
public:
int x;
int f( float y );
};
class cpp_class_type : public cpp_base {
public:
int g( float y );
};
15.2.9.1 No flags, function wrapping
type _p_cpp_base
external x_set : _p_cpp_base -> _int -> _void = "_wrap_x_set"
external x_get : _p_cpp_base -> _int = "_wrap_x_get"
external f : _p_cpp_base -> _float -> _int = "_wrap_f"
external new_cpp_base : unit -> _p_cpp_base = "_wrap_new_cpp_base"
external delete_cpp_base : _p_cpp_base -> _void = "_wrap_delete_cpp_base"
type _p_cpp_class_type
external g : _p_cpp_class_type -> _float -> _int = "_wrap_g"
external new_cpp_class_type : unit -> _p_cpp_class_type = "_wrap_new_cpp_class_type"
external delete_cpp_class_type : _p_cpp_class_type -> _void =
"_wrap_delete_cpp_class_type"
This is the default code produced by SWIG for the above module. It is the
lightest weight in terms of runtime and memory, as well as being uncomplicated
by any type inference problems. Use this wherever it is convenient, as part
of a functor, or where extra performance will be needed.
15.2.9.2 -objects
class cpp_base : _p_cpp_base -> object
(* Start superclasses *)
(* End superclasses *)
method x_set : (_int) -> _void
method x_get : _int
method f : (_float) -> _int
method cpp_base : _void
method _self_cpp_base : _p_cpp_base
end
class cpp_class_type : _p_cpp_class_type -> object
(* Start superclasses *)
(* cpp_base is a superclass *)
inherit cpp_base
(* End superclasses *)
method g : (_float) -> _int
method cpp_class_type : _void
method _self_cpp_class_type : _p_cpp_class_type
end
In addition to the code above, the -objects flag asks SWIG to generate
objects as well as functions to interface the C++ code. While not
perfect, this provides a good light-weight interface to a C++ object without
hiding too much that you might need.
Note that a _p_cpp_base (pointer to a cpp_base object) and a cpp_base class
are different. This is so that Ocaml code needn't
construct an object if the user is only handling a pointer.
In order to extract the pointer from an object, use the _self_... method
corresponding to the pointer type you want. Note that only classes
visible to the SWIG interface file are defined, and that every defined class
in a hirearchy will be correctly inherited in Ocaml. This makes it
easy to use a deep C++ inheritance tree without complicated effort, and also
allows any subtype to fill the role of its parent in an Ocaml expression
involving objects.
15.2.9.3 -objects -onlyobjects
This flag combination outputs only the object definitions into the .mli
file. It reduces the amount of code emitted to the .mli file in order
to make link info smaller, and to reduce the size of the interface file.
15.2.9.4 -classmod
This flag wraps class, struct, and union code in modules. It may
be used with -objects, but probably doesn't make a lot of sense that way.
The module name will be a capitalization of the class name, as is the
ocaml convention. Note that types which may need to be accessed outside
of the module are defined outside at global scope (such as pointer types)
since ocaml always applies scopes to types. Types such as enums, that
are defined in scope stay there.
15.2.10 Overloaded functions
Overloaded functions are disambiguated according to a simple naming rule
which produces a unique, but not necessarily meaningful name. These
names are always produced in declaration order. If you wish to extract
and rename certain overloaded methods, use the %rename directive.
15.2.11 Operator overloading
Because operators are not polymorphic in Ocaml, operator overloading
as used in C++ is not available in Ocaml, however, needed operators may be
renamed with the %rename directive as above.
15.2.12 Ocaml typemaps
The previous section illustrated an "in" typemap for converting Ocaml
objects to C. Basic typemaps are provided for all of the basic Ocaml
types, so string, int, float, etc. can be passed in and out of C functions
without a problem. Note that C++ functions that need a fixed length
buffer may be provided with an Ocaml string. Ocaml keeps the length
of the provided character buffer, so binary data is fine to store in strings.
%typemap(out) int {
$result = Val_int($1);
}
One might wish to do specific typemaps that are beyond the common ones
provided by the ocaml/typemaps.i file provided with the SWIG distribution.
Here are some addenda to the Ocaml document on C interfaces, along
with some information about the SWIG Ocaml language module that will prove
useful to the reader;
- SWIG_MakePtr takes a type descriptor (can be null), a delete function
name (can be null,"_wrap_delete_void", or a valid, custom deleter), and,
of course, a void * to something that Ocaml must hold. The deleter will
be invoked when Ocaml code finalizes the containing custom block, or when
the user frees the block with a delete function that corresponds to its type.
The type of the deleter function must be value -> value, and must
return Val_unit. This makes it compatible with the SWIG deleter (called
explicity) as well as the Ocaml finalizer.
- Although it's not extremely clear, CAMLparam... must precede all
other variable declarations, even non-ocaml ones.
- It seems that C code can use enter_blocking_section /
leave_blocking_section to release the mutex that covers ocaml code, then
regain it. This is handy if the intervening code will take a long time.
- SWIG_MustGetPtr takes a value and a type descriptor. The function
returns a void * to the actual C data. Use this to retrieve C data
from value objects.
- This SWIG module currently uses single valued returns as consistent
with C source. If multiple returns are needed, they are constructed
in the same way they would be with C code, a pointer to the target type is
given, and a value is placed there. User typemaps may override this
behavior in cases where tuple, arrray, list, or other multivalue returns
are needed.
- Arrays and tuples in ocaml are structured blocks containing data
in the Field(v,x) area.
- A list is a set of pairs (tuples with two elements), in which the
first part of the pair points to the data in this list element, and the second
part (the 'tail') contains either a pointer to another block or the literal
Val_unit.
- Enums are represented as polymorphic variants so that they can share
the `int of int member. As such, Ocaml code must create these. For
any enum named a, a pair of functions are produced in C, named a_to_int
and int_to_a. These convert between the polymorphic variant elements
and the integers (the real domain of all enums).
- This backend is made to provide a good balance between runtime performance
and ease-of-use, along with a desire to express C++ code as it is used in
the wild. These factors lead me to use Ocaml's type system in the
generated code instead of using some sort of variant (C_int of Int32.int32
| C_ptr of string ...). I may still switch to this other form if there
is a lot of need, but will leave the more efficiently typed system in regardless.
However, typemaps may not be shareable between the two paradigms.
15.2.13 Exceptions
Please view the "Raising
Exceptions" section of Interfacing C with Objective Caml.