Copyright © 2001 Southern Storm Software, Pty Ltd.
Permission to distribute unmodified copies of this work is hereby granted.
This document discusses various alternatives, with a view to devising a common localization framework for use by both Portable.NET and Mono. This framework may also be submitted to ECMA for consideration.
Note: we explicitly limit ourselves to string localization issues in this document. We do not consider dates, times, localized bitmaps, and the other issues that an application must be aware of.Each alternative is evaluated based on the following criteria:
Arg_MustBeBoolean
"
System.Resources.ResourceManager
" which can be used
to access the string resources that are embedded in
the manifest resource section of an assembly.
Note: this class does not yet appear in the ECMA specifications, but that is probably an oversight rather than a deliberate omission. We expect that the "To access localized strings, an application creates an instance of "System.Resources
" namespace will eventually be standardised.
ResourceManager
" and then calls the "GetString
"
method to access strings using their assigned "tag".The most immediate problem with this API is that it requires that the resource manager be created first. Since creating a new manager for every string access is expensive, Microsoft has adopted a convention in their own code for getting around the API's limitations.
Each of Microsoft's assemblies contains an "internal
"
class or method that caches the resource manager for that assembly
in a static field. After the first string access, all subsequent
accesses are relatively efficient. Microsoft's "mscorlib.dll" uses
"Environment.GetResourceString
" and their other assemblies
typically have a class called "SR
" within that
assembly's namespace.
There is no consistency to Microsoft's approach: every assembly must come up with its own manager caching strategy. None of these classes are accessible to other assemblies to improve code reuse.
Using our evaluation criteria:
Environment.GetResourceString("Tag")
"
is not very natural, and quickly becomes burdensome.ResourceManager
".null
, which isn't very useful as a default.
This is exacerbated by the previous point, because it is very easy
for the programmer to forget to assign a fallback English
string in the separate resource file.SR
" class in so many assemblies is quite odd.
It may be the result of massive cut-and-paste, or it may be the result
of some undocumented feature of Microsoft's tools that automatically
generates a class called "SR
" in each assembly that
requires it.
The "SR
" class contains a number of standard methods such
as "GetString
", "GetObject
", etc. It usually
also contains hundreds of fields that hold tag names for all of
the strings that are used by the assembly.
We cannot find any evidence in the .NET Beta2 Framework documentation that suggests that the C# tools are doing this automatically (perhaps it is a feature of Visual Studio.NET?). But it seems strange that Microsoft's programmers would create classes with that many fields on purpose, and then do it consistently across dozens of assemblies and thousands of classes, unless there was some kind of tool support.
If Microsoft is indeed writing the "SR
" classes by hand,
then this would seem an obvious place to introduce some automation.
The system is very natural for programmers to use. Strings can be marked for localization as follows:
_("Hello World!")
The C pre-processor takes care of expanding this to an appropriate
call to gettext
, or dgettext
in the case
of libraries. On systems without gettext
,
the '_
' macro expands to its argument and the program
behaves correctly, albeit only in the original language.
Note: gettext also has the "N_
" macro, which is used
to mark static strings. Because C# strings are always created
dynamically, we do not require an equivalent to this.
Because the overhead of localization is so low, it increases the chance
that programmers will use it and use it consistently. Automated tools
can scan C source code for uses of the '_
' macro and
create English resource files ready for translation.If a translation is not available, the macro's argument provides a meaningful English fallback. Synthetic tag names or null strings can never be displayed to the user by accident.
If the programmer wishes to gateway to a non-gettext system, they need only modify the definition of the macro and recompile. Usually this isn't necessary because GNU gettext already supports most of the popular message catalog formats.
We can create a new namespace within the C# library called
"System.I18N
", that contains a number of classes
that manage resources for any assemblies that use it. The
following is a simplified example of implementing a
gettext-style system:
A real implementation would need additional house-keeping methods, culture support, and thread synchronization. A single resource block per assembly may insufficient; thus requiring the introduction of gettext-style "domains". We have omitted these details for brevity.namespace System.I18N { using System.Collections; using System.Reflection; using System.Resources; public sealed class GetText { private static Hashtable managers = new Hashtable(); public static String _(String value) { Assembly assembly = Assembly.GetCallingAssembly(); ResourceManager mgr = (ResourceManager)(managers[assembly]); if(mgr == null) { mgr = new ResourceManager(assembly.FullName, assembly); managers[assembly] = mgr; } String str = mgr.GetString(value); if(str != null) { return str; } else { return value; } } } // class GetText } // namespace System.I18N
The effect of C's '_
' macro could be achieved as follows:
The usage of "using I = System.I18N.GetText; class Hello { public static void Main() { System.Console.WriteLine(I._("Hello World!")); } }
I._
" is a little ugly. If we were willing
to modify the core system library, we could do the following:
This would make the definition available to all classes uniformly. However, it wouldn't be backwards compatible with pre-existing C# system libraries.namespace System { using System.I18N; public class Object { ... protected static String _(String value) { return System.I18N.GetText.GetString(value); } ... } // class Object } // namespace System
Using our evaluation criteria:
System.I18N
".
However, that namespace could be implemented in a separate
assembly that is distributed with applications.System.I18N
" code, but not to any applications
that use it.System.I18N
" classes.System.I18N
" into a different assembly
will improve interoperability with existing C# libraries. However,
it will be inaccessible to our own implementation of "corlib".
Thus, we will need a different mechanism for "corlib".
However, this may not be too bad. We can create a specialized
"internal
" class within our "corlib" that has a similar
API to "System.I18N.GetText
", and then do the following:
The usage of "namespace System { using I = System.Private.GetText; public class FormatException : SystemException { // Constructors. public FormatException() : base(I._("The supplied value did not have the correct format")) {} public FormatException(String msg) : base(msg) {} public FormatException(String msg, Exception inner) : base(msg, inner) {} } // class FormatException } // namespace System
I._
" is consistent with that used by other
assemblies, but is redirected to a different class. This would allow
automated extraction tools to work consistently on "corlib" and other
assemblies.The main drawback of this approach is that gateways must be implemented in two places within the code. This is still better than Microsoft's approach.
_
' keyword to the
language as follows:
string-literal:This would be legal anywhere that a string literal is currently permitted, except in serialized attributes. The compiler converts the construct into a call on an appropriate system-supplied library.
_
(
string-literal)
...
The "System.Object
" definition trick from the previous
section can be used to provide an implementation of '_
' for
compilers that lack the keyword.
Using our evaluation criteria, everything is the same as the "new library" case, except that we have now altered the C# language. Interoperability with existing C# libraries remains a problem, but could be addressed by once again treating "corlib" as a special case.
If we place "System.I18N
" into "corlib", then programs that
use it will not run against existing C# libraries, even if they were
compiled with a keyword-enhanced compiler.
If we place "System.I18N
" into a separate assembly, then
all programs that use it will need to redistribute that DLL with their
program. And the code will be inaccessible to our own "corlib",
which would need to be treated as a special case.
Going back to the example of Microsoft's "SR
" class, and
assuming that we can modify the compiler, we may be able to do better.
The compiler could automatically insert an "internal
" class into
any assembly that uses the '_
' keyword in its source.
This class will be responsible for acquiring a resource manager,
caching it in a static field, and handling all resource requests
for that assembly. The compiler will cause the '_
'
keyword to call this internal library rather than one that is
assumed to be present elsewhere in the system.
Using our evaluation criteria:
ResourceManager
".One way to solve the gateway issue may be to provide an "escape hatch" in the internal library. This would use reflection to search for an alternative resource manager before falling back to the standard one. For example:
This would allow future versions of the C# library to provide an implementation of "private static ResourceManager GetManager(Assembly assembly) { Type type = Type.GetType ("System.I18N.ReplacementResourceManager", false); if(type != null) { return (type.GetConstructor(Type.EmptyTypes)).Invoke(null); } return new ResourceManager(assembly.FullName, assembly); }
System.I18N.ReplacementResourceManager
"
that overrides the manager's behaviour with gateway functionality.
Existing C# library implementations won't have this class, and so
the code will fallback to a standard resource manager.This "internal library" solution introduces a lot of complexity into the compiler, to get around deficiencies in existing C# libraries. It may not be worth the effort.
using
" construct as a
way to shorten the method name to "I._
", while
avoiding language extensions.