NetCDF  4.3.3.1
 All Data Structures Files Functions Variables Typedefs Macros Modules Pages
File Structure and Performance

Parts of a NetCDF Classic File

A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts:

  • a header, containing all the information about dimensions, attributes, and variables except for the variable data;
  • a data part, comprising fixed-size data, containing the data for variables that don't have an unlimited dimension; and variable-size data, containing the data for variables that have an unlimited dimension.

Both the header and data parts are represented in a machine-independent form. This form is very similar to XDR (eXternal Data Representation), extended to support efficient storage of arrays of non-byte data.

The header at the beginning of the file contains information about the dimensions, variables, and attributes in the file, including their names, types, and other characteristics. The information about each variable includes the offset to the beginning of the variable's data for fixed-size variables or the relative offset of other variables within a record. The header also contains dimension lengths and information needed to map multidimensional indices for each variable to the appropriate offsets.

By default, this header has little usable extra space; it is only as large as it needs to be for the dimensions, variables, and attributes (including all the attribute values) in the netCDF dataset, with a small amount of extra space from rounding up to the nearest disk block size. This has the advantage that netCDF files are compact, requiring very little overhead to store the ancillary data that makes the datasets self-describing. A disadvantage of this organization is that any operation on a netCDF dataset that requires the header to grow (or, less likely, to shrink), for example adding new dimensions or new variables, requires moving the data by copying it. This expense is incurred when the enddef function is called: nc_enddef in C (see nc_enddef), NF_ENDDEF in Fortran (see NF_ENDDEF), after a previous call to the redef function: nc_redef in C (see nc_redef) or NF_REDEF in Fortran (see NF_REDEF). If you create all necessary dimensions, variables, and attributes before writing data, and avoid later additions and renamings of netCDF components that require more space in the header part of the file, you avoid the cost associated with later changing the header.

Alternatively, you can use an alternative version of the enddef function with two underbar characters instead of one to explicitly reserve extra space in the file header when the file is created: in C nc__enddef (see nc__enddef), in Fortran NF__ENDDEF (see NF__ENDDEF), after a previous call to the redef function. This avoids the expense of moving all the data later by reserving enough extra space in the header to accommodate anticipated changes, such as the addition of new attributes or the extension of existing string attributes to hold longer strings.

When the size of the header is changed, data in the file is moved, and the location of data values in the file changes. If another program is reading the netCDF dataset during redefinition, its view of the file will be based on old, probably incorrect indexes. If netCDF datasets are shared across redefinition, some mechanism external to the netCDF library must be provided that prevents access by readers during redefinition, and causes the readers to call nc_sync/NF_SYNC before any subsequent access.

The fixed-size data part that follows the header contains all the variable data for variables that do not employ an unlimited dimension. The data for each variable is stored contiguously in this part of the file. If there is no unlimited dimension, this is the last part of the netCDF file.

The record-data part that follows the fixed-size data consists of a variable number of fixed-size records, each of which contains data for all the record variables. The record data for each variable is stored contiguously in each record.

The order in which the variable data appears in each data section is the same as the order in which the variables were defined, in increasing numerical order by netCDF variable ID. This knowledge can sometimes be used to enhance data access performance, since the best data access is currently achieved by reading or writing the data in sequential order.

Parts of a NetCDF-4 HDF5 File

NetCDF-4 files are created with the HDF5 library, and are HDF5 files in every way, and can be read without the netCDF-4 interface. (Note that modifying these files with HDF5 will almost certainly make them unreadable to netCDF-4.)

Groups in a netCDF-4 file correspond with HDF5 groups (although the netCDF-4 tree is rooted not at the HDF5 root, but in group “_netCDF”).

Variables in netCDF coorespond with identically named datasets in HDF5. Attributes similarly.

Since there is more metadata in a netCDF file than an HDF5 file, special datasets are used to hold netCDF metadata.

The _netcdf_dim_info dataset (in group _netCDF) contains the ids of the shared dimensions, and their length (0 for unlimited dimensions).

The _netcdf_var_info dataset (in group _netCDF) holds an array of compound types which contain the variable ID, and the associated dimension ids.

The Extended XDR Layer

XDR is a standard for describing and encoding data and a library of functions for external data representation, allowing programmers to encode data structures in a machine-independent way. Classic or 64-bit offset netCDF employs an extended form of XDR for representing information in the header part and the data parts. This extended XDR is used to write portable data that can be read on any other machine for which the library has been implemented.

The cost of using a canonical external representation for data varies according to the type of data and whether the external form is the same as the machine's native form for that type.

For some data types on some machines, the time required to convert data to and from external form can be significant. The worst case is reading or writing large arrays of floating-point data on a machine that does not use IEEE floating-point as its native representation.

Large File Support

It is possible to write netCDF files that exceed 2 GiByte on platforms that have "Large File Support" (LFS). Such files are platform-independent to other LFS platforms, but trying to open them on an older platform without LFS yields a "file too large" error.

Without LFS, no files larger than 2 GiBytes can be used. The rest of this section applies only to systems with LFS.

The original binary format of netCDF (classic format) limits the size of data files by using a signed 32-bit offset within its internal structure. Files larger than 2 GiB can be created, with certain limitations. See Classic Limitations.

In version 3.6.0, netCDF included its first-ever variant of the underlying data format. The new format introduced in 3.6.0 uses 64-bit file offsets in place of the 32-bit offsets. There are still some limits on the sizes of variables, but the new format can create very large datasets. See 64 bit Offset Limitations.

NetCDF-4 variables and files can be any size supported by the underlying file system.

The original data format (netCDF classic), is still the default data format for the netCDF library.

The following table summarizes the size limitations of various permutations of LFS support, netCDF version, and data format. Note that 1 GiB = 2^30 bytes or about 1.07e+9 bytes, 1 EiB = 2^60 bytes or about 1.15e+18 bytes. Note also that all sizes are really 4 bytes less than the ones given below. For example the maximum size of a fixed variable in netCDF 3.6 classic format is really 2 GiB - 4 bytes.

Limit No LFS v3.5 v3.6/classic v3.6/64-bit offset v4.0/netCDF-4

Max File Size 2 GiB 8 EiB 8 EiB 8 EiB ??

Max Number of Fixed Vars > 2 GiB 0 1 (last) 1 (last) 2^32 ??

Max Record Vars w/ Rec Size > 2 GiB 0 1 (last) 1 (last) 2^32 ??

Max Size of Fixed/Record Size of Record Var 2 GiB 2 GiB 2 GiB 4 GiB ??

Max Record Size 2 GiB/nrecs 4 GiB 8 EiB/nrecs 8 EiB/nrecs ??

For more information about the different file formats of netCDF See Which Format.

NetCDF 64-bit Offset Format Limitations

Although the 64-bit offset format allows the creation of much larger netCDF files than was possible with the classic format, there are still some restrictions on the size of variables.

It's important to note that without Large File Support (LFS) in the operating system, it's impossible to create any file larger than 2 GiBytes. Assuming an operating system with LFS, the following restrictions apply to the netCDF 64-bit offset format.

No fixed-size variable can require more than 2^32 - 4 bytes (i.e. 4GiB

  • 4 bytes, or 4,294,967,292 bytes) of storage for its data, unless it is the last fixed-size variable and there are no record variables. When there are no record variables, the last fixed-size variable can be any size supported by the file system, e.g. terabytes.

A 64-bit offset format netCDF file can have up to 2^32 - 1 fixed sized variables, each under 4GiB in size. If there are no record variables in the file the last fixed variable can be any size.

No record variable can require more than 2^32 - 4 bytes of storage for each record's worth of data, unless it is the last record variable. A 64-bit offset format netCDF file can have up to 2^32 - 1 records, of up to 2^32 - 1 variables, as long as the size of one record's data for each record variable except the last is less than 4 GiB - 4.

Note also that all netCDF variables and records are padded to 4 byte boundaries.

NetCDF Classic Format Limitations

There are important constraints on the structure of large netCDF classic files that result from the 32-bit relative offsets that are part of the netCDF classic file format:

The maximum size of a record in the classic format in versions 3.5.1 and earlier is 2^32 - 4 bytes, or about 4 GiB. In versions 3.6.0 and later, there is no such restriction on total record size for the classic format or 64-bit offset format.

If you don't use the unlimited dimension, only one variable can exceed 2 GiB in size, but it can be as large as the underlying file system permits. It must be the last variable in the dataset, and the offset to the beginning of this variable must be less than about 2 GiB.

The limit is really 2^31 - 4. If you were to specify a variable size of 2^31 -3, for example, it would be rounded up to the nearest multiple of 4 bytes, which would be 2^31, which is larger than the largest signed integer, 2^31 - 1.

For example, the structure of the data might be something like:

netcdf bigfile1 {
dimensions:
x=2000;
y=5000;
z=10000;
variables:
double x(x); // coordinate variables
double y(y);
double z(z);
double var(x, y, z); // 800 Gbytes
}

If you use the unlimited dimension, record variables may exceed 2 GiB in size, as long as the offset of the start of each record variable within a record is less than 2 GiB - 4. For example, the structure of the data in a 2.4 Tbyte file might be something like:

netcdf bigfile2 {
dimensions:
x=2000;
y=5000;
z=10;
t=UNLIMITED; // 1000 records, for example
variables:
double x(x); // coordinate variables
double y(y);
double z(z);
double t(t);
// 3 record variables, 2400000000 bytes per record
double var1(t, x, y, z);
double var2(t, x, y, z);
double var3(t, x, y, z);
}

The NetCDF-3 I/O Layer

The following discussion applies only to netCDF classic and 64-bit offset files. For netCDF-4 files, the I/O layer is the HDF5 library.

For netCDF classic and 64-bit offset files, an I/O layer implemented much like the C standard I/O (stdio) library is used by netCDF to read and write portable data to netCDF datasets. Hence an understanding of the standard I/O library provides answers to many questions about multiple processes accessing data concurrently, the use of I/O buffers, and the costs of opening and closing netCDF files. In particular, it is possible to have one process writing a netCDF dataset while other processes read it.

Data reads and writes are no more atomic than calls to stdio fread() and fwrite(). An nc_sync/NF_SYNC call is analogous to the fflush call in the C standard I/O library, writing unwritten buffered data so other processes can read it; The C function nc_sync (see nc_sync), or the Fortran function NF_SYNC (see NF_SYNC), also brings header changes up-to-date (for example, changes to attribute values). Opening the file with the NC_SHARE (in C) or the NF_SHARE (in Fortran) is analogous to setting a stdio stream to be unbuffered with the _IONBF flag to setvbuf.

As in the stdio library, flushes are also performed when "seeks" occur to a different area of the file. Hence the order of read and write operations can influence I/O performance significantly. Reading data in the same order in which it was written within each record will minimize buffer flushes.

You should not expect netCDF classic or 64-bit offset format data access to work with multiple writers having the same file open for writing simultaneously.

It is possible to tune an implementation of netCDF for some platforms by replacing the I/O layer with a different platform-specific I/O layer. This may change the similarities between netCDF and standard I/O, and hence characteristics related to data sharing, buffering, and the cost of I/O operations.

The distributed netCDF implementation is meant to be portable. Platform-specific ports that further optimize the implementation for better I/O performance are practical in some cases.

Parallel Access with NetCDF-4

Use the special parallel open (or create) calls to open (or create) a file, and then to use parallel I/O to read or write that file (see nc_open_par()).

Note that the chunk cache is turned off if a file is opened for parallel I/O in read/write mode. Open the file in read-only mode to engage the chunk cache.

NetCDF uses the HDF5 parallel programming model for parallel I/O with netCDF-4/HDF5 files. The HDF5 tutorial (http://hdfgroup.org/HDF5//HDF5/Tutor) is a good reference.

For classic and 64-bit offset files, netCDF uses the parallel-netcdf (formerly pnetcdf) library from Argonne National Labs/Nortwestern University. For parallel access of classic and 64-bit offset files, netCDF must be configured with the –with-pnetcdf option at build time. See the parallel-netcdf site for more information (http://www.mcs.anl.gov/parallel-netcdf).

Interoperability with HDF5

To create HDF5 files that can be read by netCDF-4, use the latest in the HDF5 1.8.x series.

HDF5 has some features that will not be supported by netCDF-4, and will cause problems for interoperability:

  • HDF5 allows a Group to be both an ancestor and a descendant of another Group, creating cycles in the subgroup graph. HDF5 also permits multiple parents for a Group. In the netCDF-4 data model, Groups form a tree with no cycles, so each Group (except the top-level unnamed Group) has a unique parent.
  • HDF5 supports "references" which are like pointers to objects and data regions within a file. The netCDF-4 data model omits references.
  • HDF5 supports some primitive types that are not included in the netCDF-4 data model, including H5T_TIME and H5T_BITFIELD.
  • HDF5 supports multiple names for data objects like Datasets (netCDF-4 variables) with no distinguished name. The netCDF-4 data model requires that each variable, attribute, dimension, and group have a single distinguished name.
  • HDF5 (like netCDF) supports scalar attributes, but netCDF-4 cannot read scalar HDF5 attributes (unless it is a string attribute). This limitation will be removed in a future release of netCDF.

These are fairly easy requirements to meet, but there is one relating to shared dimensions which is a little more challenging. Every HDF5 dataset must have a dimension scale attached to each dimension.

Dimension scales are a new feature for HF 1.8, which allow specification of shared dimensions.

Without creation order in the HDF5 file, the files will still be readable to netCDF-4, it's just that netCDF-4 will number the variables in alphabetical, rather than creation, order.

Interoperability is a complex task, and all of this is in the alpha release stage. It is tested in libsrc4/tst_interops.c, which contains some examples of how to create HDF5 files, modify them in netCDF-4, and then verify them in HDF5. (And vice versa).

DAP Support

Beginning with netCDF version 4.1, optional support is provided for accessing data through OPeNDAP servers using the DAP protocol. Currently, only DAP protocol version 2 is supported; DAP protocol version 4 support is under development.

DAP support is automatically enabled if a usable curl library can be set using the LDFLAGS environment variable (similar to the way that the HDF5 libraries are referenced). DAP support can forcibly be enabled or disabled using the –enable-dap flag or the –disable-dap flag, respectively. If enabled, then DAP2 support requires access to the curl library. Refer to the installation manual for details

DAP uses a data model that is different from that supported by netCDF, either classic or enhanced. Generically, the DAP data model is encoded textually in a DDS (Dataset Descriptor Structure). There is a second data model for DAP attributes, which is encoded textually in a DAS (Dataset Attribute Structure). For detailed information about the DAP DDS and DAS, refer to the OPeNDAP web site http://opendap.org.

OPeNDAP Data

In order to access an OPeNDAP data source through the netCDF API, the file name normally used is replaced with a URL with a specific format. The URL is composed of three parts.

  • URL - this is a standard form URL such as http://remotetest.unidata.ucar.edu/dts/test.01
  • Constraints - these are suffixed to the URL and take the form “?<projections>&<selections>”. The meaning of the terms projection and selection is somewhat complicated; and the OPeNDAP web site, http://www.opendap.org, should be consulted. The interaction of DAP constraints with netCDF is complex and at the moment requires an understanding of how DAP is translated to netCDF.
  • Client parameters - these may be specified in either of two ways. The older, deprecated form prefixes text to the front of the url and is of the the general form [<name>] or [<name>=value]. Examples include [show=fetch] and [noprefetch]. The newer, preferred form prefixes the parameters to the end of the url using the semi-standard '#' format: e.g. http://....#show=fetch&noprefetch.

It is possible to see what the translation does to a particular DAP data source in either of two ways. First, one can examine the DDS source through a web browser and then examine the translation using the ncdump -h command to see the netCDF Classic translation. The ncdump output will actually be the union of the DDS with the DAS, so to see the complete translation, it is necessary to view both.

For example, if a web browser is given the following, the first URL will return the DDS for the specified dataset, and the second URL will return the DAS for the specified dataset.

http://remotetest.unidata.ucar.edu/dts/test.01.dds
http://remotetest.unidata.ucar.edu/dts/test.01.das

Then by using the following ncdump command, it is possible to see the equivalent netCDF Classic translation.

ncdump -h http://remotetest.unidata.ucar.edu/dts/test.01

The DDS output from the web server should look like this.

Dataset {
Byte b;
Int32 i32;
UInt32 ui32;
Int16 i16;
UInt16 ui16;
Float32 f32;
Float64 f64;
String s;
Url u;
} SimpleTypes;

The DAS output from the web server should look like this.

Attributes {
Facility {
String PrincipleInvestigator ``Mark Abbott'', ``Ph.D'';
String DataCenter ``COAS Environmental Computer Facility'';
String DrifterType ``MetOcean WOCE/OCM'';
}
b {
String Description ``A test byte'';
String units ``unknown'';
}
i32 {
String Description ``A 32 bit test server int'';
String units ``unknown'';
}
}

The output from ncdump should look like this.

netcdf test {
dimensions:
stringdim64 = 64 ;
variables:
byte b ;
b:Description = "A test byte" ;
b:units = "unknown" ;
int i32 ;
i32:Description = "A 32 bit test server int" ;
i32:units = "unknown" ;
int ui32 ;
short i16 ;
short ui16 ;
float f32 ;
double f64 ;
char s(stringdim64) ;
char u(stringdim64) ;
}

Note that the fields of type String and type URL have suddenly acquired a dimension. This is because strings are translated to arrays of char, which requires adding an extra dimension. The size of the dimension is determined in a variety of ways and can be specified. It defaults to 64 and when read, the underlying string is either padded or truncated to that length.

Also note that the Facility attributes do not appear in the translation because they are neither global nor associated with a variable in the DDS.

Alternately, one can get the text of the DDS as a global attribute by using the client parameters mechanism . In this case, the parameter “show=dds” can be used, and the data retrieved using the following command

ncdump -h http://remotetest.unidata.ucar.edu/dts/test.01.dds#show=dds

The ncdump -h command will then show both the translation and the original DDS. In the above example, the DDS would appear as the global attribute “_DDS” as follows.

netcdf test {
...
variables:
:_DDS = "Dataset { Byte b; Int32 i32; UInt32 ui32; Int16 i16;
UInt16 ui16; Float32 f32; Float64 f64;
Strings; Url u; } SimpleTypes;"
byte b ;
...
}

DAP to NetCDF Translation Rules

Currently only one translation available: DAP 2 Protocol to netCDF-3. There used to be a DAP 2 Protocol to netCDF-4 translation but that has been removed until the DAP4 protocol is available.

netCDF-3 Translation Rules

The current default translation code translates the OPeNDAP protocol to netCDF-3 (classic). This netCDF-3 translation converts an OPeNDAP DAP protocol version 2 DDS to netCDF-3 and is designed to mimic as closely as possible the translation provided by the libnc-dap system, except that some errors in that older translation have been fixed.

For illustrative purposes, the following example will be used.

Dataset {
Int32 f1;
Structure {
Int32 f11;
Structure {
Int32 f1[3];
Int32 f2;
} FS2[2];
} S1;
Structure {
Grid {
Array:
Float32 temp[lat=2][lon=2];
Maps:
Int32 lat[lat=2];
Int32 lon[lon=2];
} G1;
} S2;
Grid {
Array:
Float32 G2[lat=2][lon=2];
Maps:
Int32 lat[2];
Int32 lon[2];
} G2;
Int32 lat[lat=2];
Int32 lon[lon=2];
} D1;
\code
\subsubsection var_def Variable Definition
The set of netCDF variables is derived from the fields with primitive
base types as they occur in Sequences, Grids, and Structures. The
field names are modified to be fully qualified initially. For the
above, the set of variables are as follows. The coordinate variables
within grids are left out in order to mimic the behavior of libnc-dap.
\code
f1
S1.f11
S1.FS2.f1
S1.FS2.f2
S2.G1.temp
S2.G2.G2
lat
lon

Variable Dimension Translation

A variable's rank is determined from three sources.

  • The variable has the dimensions associated with the field it represents (e.g. S1.FS2.f1[3] in the above example).
  • The variable inherits the dimensions associated with any containing structure that has a rank greater than zero. These dimensions precede those of case 1. Thus, we have in our example, f1[2][3], where the first dimension comes from the containing Structure FS2[2].
  • The variable's set of dimensions are altered if any of its containers is a DAP DDS Sequence. This is discussed more fully below.

If the type of the netCDF variable is char, then an extra string dimension is added as the last dimension.

Dimension translation

For dimensions, the rules are as follows.

Fields in dimensioned structures inherit the dimension of the structure; thus the above list would have the following dimensioned variables.

S1.FS2.f1 -> S1.FS2.f1[2][3]
S1.FS2.f2 -> S1.FS2.f2[2]
S2.G1.temp -> S2.G1.temp[lat=2][lon=2]
S2.G1.lat -> S2.G1.lat[lat=2]
S2.G1.lon -> S2.G1.lon[lon=2]
S2.G2.G2 -> S2.G2.lon[lat=2][lon=2]
S2.G2.lat -> S2.G2.lat[lat=2]
S2.G2.lon -> S2.G2.lon[lon=2]
lat -> lat[lat=2]
lon -> lon[lon=2]

Collect all of the dimension specifications from the DDS, both named and anonymous (unnamed) For each unique anonymous dimension with value NN create a netCDF dimension of the form "XX_<i>=NN", where XX is the fully qualified name of the variable and i is the i'th (inherited) dimension of the array where the anonymous dimension occurs. For our example, this would create the following dimensions.

S1.FS2.f1_0 = 2 ;
S1.FS2.f1_1 = 3 ;
S1.FS2.f2_0 = 2 ;
S2.G2.lat_0 = 2 ;
S2.G2.lon_0 = 2 ;

If however, the anonymous dimension is the single dimension of a MAP vector in a Grid then the dimension is given the same name as the map vector This leads to the following.

S2.G2.lat_0 -> S2.G2.lat
S2.G2.lon_0 -> S2.G2.lon

For each unique named dimension "<name>=NN", create a netCDF dimension of the form "<name>=NN", where name has the qualifications removed. If this leads to duplicates (i.e. same name and same value), then the duplicates are ignored. This produces the following.

S2.G2.lat -> lat
S2.G2.lon -> lon

Note that this produces duplicates that will be ignored later.

At this point the only dimensions left to process should be named dimensions with the same name as some dimension from step number 3, but with a different value. For those dimensions create a dimension of the form "<name>M=NN" where M is a counter starting at 1. The example has no instances of this.

Finally and if needed, define a single UNLIMITED dimension named "unlimited" with value zero. Unlimited will be used to handle certain kinds of DAP sequences (see below).

This leads to the following set of dimensions.

dimensions:
unlimited = UNLIMITED;
lat = 2 ;
lon = 2 ;
S1.FS2.f1_0 = 2 ;
S1.FS2.f1_1 = 3 ;
S1.FS2.f2_0 = 2 ;

Variable Name Translation

The steps for variable name translation are as follows.

Take the set of variables captured above. Thus for the above DDS, the following fields would be collected.

f1
S1.f11
S1.FS2.f1
S1.FS2.f2
S2.G1.temp
S2.G2.G2
lat
lon

All grid array variables are renamed to be the same as the containing grid and the grid prefix is removed. In the above DDS, this results in the following changes.

G1.temp -> G1
G2.G2 -> G2

It is important to note that this process could produce duplicate variables (i.e. with the same name); in that case they are all assumed to have the same content and the duplicates are ignored. If it turns out that the duplicates have different content, then the translation will not detect this. YOU HAVE BEEN WARNED.

The final netCDF-3 schema (minus attributes) is then as follows.

netcdf t {
dimensions:
unlimited = UNLIMITED ;
lat = 2 ;
lon = 2 ;
S1.FS2.f1_0 = 2 ;
S1.FS2.f1_1 = 3 ;
S1.FS2.f2_0 = 2 ;
variables:
int f1 ;
int lat(lat) ;
int lon(lon) ;
int S1.f11 ;
int S1.FS2.f1(S1.FS2.f1_0, S1.FS2.f1_1) ;
int S1.FS2.f2(S1_FS2_f2_0) ;
float S2.G1(lat, lon) ;
float G2(lat, lon) ;
}

In actuality, the unlimited dimension is dropped because it is unused.

There are differences with the original libnc-dap here because libnc-dap technically was incorrect. The original would have said this, for example.

int S1.FS2.f1(lat, lat) ;

Note that this is incorrect because it dimensions S1.FS2.f1(2,2) rather than S1.FS2.f1(2,3).

Translating DAP DDS Sequences

Any variable (as determined above) that is contained directly or indirectly by a Sequence is subject to revision of its rank using the following rules.

Let the variable be contained in Sequence Q1, where Q1 is the innermost containing sequence. If Q1 is itself contained (directly or indirectly) in a sequence, or Q1 is contained (again directly or indirectly) in a structure that has rank greater than 0, then the variable will have an initial UNLIMITED dimension. Further, all dimensions coming from "above" and including (in the containment sense) the innermost Sequence, Q1, will be removed and replaced by that single UNLIMITED dimension. The size associated with that UNLIMITED is zero, which means that its contents are inaccessible through the netCDF-3 API. Again, this differs from libnc-dap, which leaves out such variables. Again, however, this difference is backward compatible.

If the variable is contained in a single Sequence (i.e. not nested) and all containing structures have rank 0, then the variable will have an initial dimension whose size is the record count for that Sequence. The name of the new dimension will be the name of the Sequence.

Consider this example.

Dataset {
Structure {
Sequence {
Int32 f1[3];
Int32 f2;
} SQ1;
} S1[2];
Sequence {
Structure {
Int32 x1[7];
} S2[5];
} Q2;
} D;

The corresponding netCDF-3 translation is pretty much as follows (the value for dimension Q2 may differ).

dimensions:
unlimited = UNLIMITED ; // (0 currently)
S1.SQ1.f1_0 = 2 ;
S1.SQ1.f1_1 = 3 ;
S1.SQ1.f2_0 = 2 ;
Q2.S2.x1_0 = 5 ;
Q2.S2.x1_1 = 7 ;
Q2 = 5 ;
variables:
int S1.SQ1.f1(unlimited, S1.SQ1.f1_1) ;
int S1.SQ1.f2(unlimited) ;
int Q2.S2.x1(Q2, Q2.S2.x1_0, Q2.S2.x1_1) ;

Note that for example S1.SQ1.f1_0 is not actually used because it has been folded into the unlimited dimension.

Note that for sequences without a leading unlimited dimension, there is a performance cost because the translation code has to walk the data to determine how many records are associated with the sequence. Since libnc-dap did essentially the same thing, it can be assumed that the cost is not prohibitive.

Caching

In an effort to provide better performance for some access patterns, client-side caching of data is available. The default is no caching, but it may be enabled by prefixing the URL with the paramter "cache".

Caching operates basically as follows.

When a URL is first accessed using nc_open(), netCDF automatically does a pre-fetch of selected variables. These include all variables smaller than a specified (and user definable) size. This allows, for example, quick access to coordinate variables. This can be suppressed with the parameter "noprefetch".

Whenever a request is made using some variant of the nc_get_var() API procedures, the complete variable is fetched and stored in the cache as a new cache entry. Subsequence requests for any part of that variable will access the cache entry to obtain the data.

The cache may become too full, either because there are too many entries or because it is taking up too much disk space. In this case cache entries are purged until the cache size limits are reached. The cache purge algorithm is LRU (least recently used) so that variables that are repeatedly referenced will tend to stay in the cache.

The cache is completely purged when nc_close() is invoked.

In order to decide if you should enable caching, you will need to have some understanding of the access patterns of your program.

The ncdump program always dumps one or more whole variables so it turns on caching.

If your program accesses only parts of a number of variables, then caching should probably not be used since fetching whole variables will probably slow down your program for no purpose.

Unfortunately, caching is currently an all or nothing proposition, so for more complex access patterns, the decision to cache or not may not have an obvious answer. Probably a good rule of thumb is to avoid caching initially and later turn it on to see its effect on performance.

Defined Client Parameters

Currently, a limited set of client parameters is recognized. Parameters not listed here are ignored, but no error is signalled.

Parameter Name Legal Values Semantics

  • "log" | "log=<file>" - Turn on logging and send the log output to the specified file. If no file is specified, then output is sent to standard error.
  • "show=... das|dds|url" - This causes information to appear as specific global attributes. The currently recognized tags are "dds" to display the underlying DDS, "das" similarly, and "url" to display the url used to retrieve the data. This parameter may be specified multiple times (e.g. “show=dds&show=url”).
  • "show=fetch" - This parameter causes the netCDF code to log a copy of the complete url for every HTTP get request. If logging is enabled, then this can be helpful in checking to see the access behavior of the netCDF code.
  • "stringlength=NN" - Specify the default string length to use for string dimensions. The default is 64.
  • "stringlength_<var>=NN" - Specify the default string length to use for a string dimension for the specified variable. The default is 64.
  • "cache" - This enables caching.
  • "cachelimit=NN" - Specify the maximum amount of space allowed for the cache.
  • "cachecount=NN" - Specify the maximum number of entries in the cache.
  • "noprefetch" - This disables prefetch of small variables.

Notes on Debugging OPeNDAP Access

The OPeNDAP support makes use of the logging facility of the underlying oc system (see http://www.opendap.org/oc). Note that this is currently separate from the existing netCDF logging facility. Turning on this logging can sometimes give important information. Logging can be enabled by using the client parameter "log" or "log=filename", where the first case will send log output to standard error and the second will send log output to the specified file.

Users should also be aware that if one is accessing data over an NFS mount, one may see some .nfsxxxxx files; those can be ignored.

HTTP Configuration.

Limited support for configuring the http connection is provided via parameters in the “.dodsrc” configuration file. The relevant .dodsrc file is located by first looking in the current working directory, and if not found, then looking in the directory specified by the “$HOME” environment variable.

Entries in the .dodsrc file are of the form:

['['<url>']']<key>=<value>

That is, it consists of a key name and value pair and optionally preceded by a url enclosed in square brackets.

For given KEY and URL strings, the value chosen is as follows:

If URL is null, then look for the .dodsrc entry that has no url prefix and whose key is same as the KEY for which we are looking.

If the URL is not null, then look for all the .dodsrc entries that have a url, URL1, say, and for which URL1 is a prefix (in the string sense) of URL. For example, if URL = http//x.y/a, then it will match entries of the form

1. [http//x.y/a]KEY=VALUE
2. [http//x.y/a/b]KEY=VALUE

It will not match an entry of the form

[http//x.y/b]KEY=VALUE

because “http://x.y/b” is not a string prefix of “http://x.y/a”. Finally from the set so constructed, choose the entry with the longest url prefix: “http//x.y/a/b]KEY=VALUE” in this case.

Currently, the supported set of keys (with descriptions) are as follows.

    HTTP.VERBOSE
        Type: boolean ("1"/"0")
        Description: Produce verbose output, especially using SSL.
        Related CURL Flags: CURLOPT_VERBOSE
    HTTP.DEFLATE
        Type: boolean ("1"/"0")
        Description: Allow use of compression by the server.
        Related CURL Flags: CURLOPT_ENCODING
    HTTP.COOKIEJAR
        Type: String representing file path
        Description: Specify the name of file into which to store cookies. Defaults to in-memory storage.
        Related CURL Flags:CURLOPT_COOKIEJAR
    HTTP.CREDENTIALS.USER
        Type: String representing user name
        Description: Specify the user name for Digest and Basic authentication.
        Related CURL Flags:
    HTTP.CREDENTIALS.PASSWORD
        Type: String representing password
        Type: boolean ("1"/"0")
        Description: Specify the password for Digest and Basic authentication.
        Related CURL Flags:
    HTTP.SSL.CERTIFICATE
        Type: String representing file path
        Description: Path to a file containing a PEM cerficate.
        Related CURL Flags: CURLOPT_CERT
    HTTP.SSL.KEY
        Type: String representing file path
        Description: Same as HTTP.SSL.CERTIFICATE, and should usually have the same value.
        Related CURL Flags: CURLOPT_SSLKEY
    HTTP.SSL.KEYPASSWORD
        Type: String representing password
        Description: Password for accessing the HTTP.SSL.KEY/HTTP.SSL.CERTIFICATE
        Related CURL Flags: CURLOPT_KEYPASSWORD
    HTTP.SSL.CAPATH
        Type: String representing directory
        Description: Path to a directory containing trusted certificates for validating server sertificates.
        Related CURL Flags: CURLOPT_CAPATH
    HTTP.SSL.VALIDATE
        Type: boolean ("1"/"0")
        Description: Cause the client to verify the server's presented certificate.
        Related CURL Flags: CURLOPT_SSL_VERIFYPEER, CURLOPT_SSL_VERIFYHOST
    HTTP.TIMEOUT
        Type: String ("dddddd")
        Description: Specify the maximum time in seconds that you allow the http transfer operation to take.
        Related CURL Flags: CURLOPT_TIMEOUT, CURLOPT_NOSIGNAL
    HTTP.PROXY_SERVER
        Type: String representing url to access the proxy: (e.g.http://[username:password@]host[:port])
        Description: Specify the needed information for accessing a proxy.
        Related CURL Flags: CURLOPT_PROXY, CURLOPT_PROXYHOST, CURLOPT_PROXYUSERPWD

The related curl flags line indicates the curl flags modified by this key. See the libcurl documentation of the curl_easy_setopt() function for more detail (http://curl.haxx.se/libcurl/c/curl_easy_setopt.html).

For ESG client side key support, the following entries must be specified:

HTTP.SSL.VALIDATE
HTTP.COOKIEJAR
HTTP.SSL.CERTIFICATE
HTTP.SSL.KEY
HTTP.SSL.CAPATH

Additionally, for ESG, the HTTP.SSL.CERTIFICATE and HTTP.SSL.KEY entries should have same value, which is the file path for the certificate produced by MyProxyLogon. The HTTP.SSL.CAPATH entry should be the path to the "certificates" directory produced by MyProxyLogon.


Return to the Main Unidata NetCDF page.
Generated on Mon Jul 13 2015 07:53:25 for NetCDF. NetCDF is a Unidata library.