VTK  9.2.6
/tmp/B.kmnaiare/BUILD/VTK-9.2.6/Documentation/Doxygen/DataAssembly.md
Go to the documentation of this file.
1 # Data Assembly
2 
3 VTK 10.0 introduces a new mechanism for representing data hierarchies
4 using vtkPartitionedDataSetCollection and vtkDataAssembly. This document
5 describes the design details.
6 
7 # Data Model
8 
9 The design is based on three classes:
10 
11 * `vtkPartitionedDataSet` is a collection of datasets (not to be confused with
12  `vtkDataSet`).
13 * `vtkPartitionedDataSetCollection` is a collection of `vtkPartitionedDataSet`s.
14 * `vtkDataAssembly` defines the hierarchical relationships between items in a
15  `vtkPartitionedDataSetCollection`.
16 
17 ## Partitioned Dataset
18 
19 `vtkPartitionedDataSet` is simply a collection of datasets that are to be
20 treated as a logical whole. In data-parallel applications, each dataset may
21 represent a partition of the complete dataset on the current worker process,
22 rank, or thread. Each dataset in a `vtkPartitionedDataSet` is called a
23 **partition**, implying it is only a part of a whole.
24 
25 All non-null partitions have similar field and attribute arrays. For example, if
26 a `vtkPartitionedDataSet` comprises of `vtkDataSet` subclasses, all will have
27 exactly the same number of point data/cell data arrays, with same names, same
28 number of components, and same data types.
29 
30 ## Partitioned Dataset Collection
31 
32 `vtkPartitionedDataSetCollection` is a collection of `vtkPartitionedDataSet`.
33 Thus, it is simply a mechanism to group multiple `vtkPartitionedDataSet`
34 instances together. Since each `vtkPartitionedDataSet` represents a whole dataset
35 (not be confused with `vtkDataSet`), we can refer to each item in a
36 `vtkPartitionedDataSetCollection` as a **partitioned-dataset**.
37 
38 Unlike items in the `vtkPartitionedDataSet`, there are no restrictions of consistency
39 between each items, partitioned-datasets, in the `vtkPartitionedDataSetCollection`.
40 Thus, in the multiblock-dataset parlance, each item in this collection can be thought
41 of as a block.
42 
43 ## Data Assembly
44 
45 `vtkDataAssembly` is a means to define an hierarchical organization of items in a
46 `vtkPartitionedDataSetCollection`. This is literally a tree made up of named nodes.
47 Each node in the tree can have associated dataset-indices. For a `vtkDataAssembly` is
48 associated with a `vtkPartitionedDataSetCollection`, each of the
49 dataset-indices is simply the index of a partitioned-dataset in the
50 `vtkPartitionedDataSetCollection`. A dataset-index can be associated with multiple nodes in
51 the assembly, however, a dataset-index cannot be associated with the same node more than once.
52 
53 An assembly provides an ability to define a more complex view of the raw data blocks in
54 a more application-specific form. This is not much different than what could be achieved using simply
55 a `vtkMultiBlockDataSet`. However, there are several advantages to this separation of storage
56 (`vtkPartitionedDataSetCollection`) and organization (`vtkDataAssembly`). These will become clear as
57 we cover different use-cases.
58 
59 While nodes in the data-assembly have unique ids, public facing algorithm APIs should not use them. For example
60 an extract-block filter that allows users to choose which blocks (rather partitioned-datasets)
61 to extract from vtkPartitionedDataSetCollection can expose an API that lets users provide
62 path-expression to identify nodes in the associated data-assembly using their names.
63 
64 Besides accessing nodes by querying using their names, `vtkDataAssembly` also
65 supports a mechanism to iterate over all nodes in depth-first or breadth-first
66 order using a *visitor*. vtkDataAssemblyVisitor defines a API that can be
67 implemented to do custom action as each node in the tree is visited.
68 
69 # Design Implications
70 
71 1. Since `vtkPartitionedDataSet` is simply parts of a whole, there is no specific significance
72  to the number of partitions. In distributed pipelines, for example, a `vtkPartitionedDataSet`
73  on each rank can have arbitrarily many partitions. Furthermore, filters can add/remove
74  partitions as needed. Since the `vtkDataAssembly` never refers to individual partitions, this has no
75  implication to filters that use the hierarchical relationships.
76 
77 2. When constructing `vtkPartitionedDataSetCollection` in distributed data-parallel cases,
78  each rank should have exactly the same number of partitioned-datasets.
79  In this case, each `vtkPartitionedDataSet` at a specific index across all ranks together is
80  treated as a whole dataset. Similarly, the `vtkDataAssembly` on each should be identical.
81 
82 3. When developing filters, it is worth considering whether the filter really is a
83  `vtkPartitionedDataSetCollection` filter or simply a `vtkPartitionedDataSet`-aware
84  filter that needs to operate on each `vtkPartitionedDataSet` individually. For example,
85  typical multiblock-aware filters like ghost-cell-generation, data-redistribution, etc.,
86  are simply `vtkPartitionedDataSet` filters. For `vtkPartitionedDataSet`-only filters,
87  when the input is a `vtkPartitionedDataSetCollection`, the executive takes care of looping
88  over each of the partitioned-dataset in the collection, thus simplifying the filter development.
89 
90 4. Filters that don't change the number of partitioned-datasets in a
91  vtkPartitionedDataSetCollection don't generally affect the relationships
92  between the partitioned-datasets and hence can largely pass through the
93  vtkDataAssembly. Only filter like extract-block that remove
94  partitioned-datasets need to update the vtkDataAssembly. There too,
95  vtkDataAssembly provides several convenience methods to update the tree with
96  ease.
97 
98 5. It is possible to develop a mapper that uses the `vtkDataAssembly`. Using
99  APIs that let users use path-queries to specify rendering properties for
100  various nodes, the mapper can support use-cases where the input structure
101  keeps changing but the relationships remain largely intact.
102  Since the same dataset-index can be associated with multiple nodes in a
103  `vtkDataAssembly`, the mapper can effectively support scene-graph like
104  capabilities where user can specify transforms, and other rendering
105  parameters, while reusing the heavy datasets. The mapper can easily tell if
106  a dataset has already been uploaded to the rendering pipeline since it will
107  have the same id and indeed be the same instance even if is being visited
108  through different branches in the tree.