forked from jgoldfar/trilinos
-
Notifications
You must be signed in to change notification settings - Fork 0
/
RELEASE_NOTES
4815 lines (3529 loc) · 207 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
###############################################################################
# #
# Trilinos Release 11.8 Release Notes #
# #
###############################################################################
Overview:
The Trilinos Project is an effort to develop algorithms and enabling
technologies within an object-oriented software framework for the solution of
large-scale, complex multi-physics engineering and scientific problems.
Packages:
The Trilinos 11.8 general release contains 54 packages: Amesos, Amesos2,
Anasazi, AztecOO, Belos, CTrilinos, Didasko, Epetra, EpetraExt, FEI,
ForTrilinos, Galeri, GlobiPack, Ifpack, Ifpack2, Intrepid, Isorropia, Kokkos,
Komplex, LOCA, Mesquite, ML, Moertel, MOOCHO, NOX, Optika, OptiPack, Pamgen,
Phalanx, Piro, Pliris, PyTrilinos, RTOp, Rythmos, Sacado, SEACAS, Shards,
ShyLU, STK, Stokhos, Stratimikos, Sundance, Teko, Teuchos, ThreadPool, Thyra,
Tpetra, TriKota, TrilinosCouplings, Trios, Triutils, Xpetra, Zoltan, Zoltan2.
Amesos2
- Amesos2's adapter for Tpetra now caches Import and Export objects, so that it
doesn't have to recreate them on every solve. This fixes Bug 6011 and should
improve performance of solves.
Belos
- More Belos solvers now work with complex Scalar type; more now compile for
complex Scalar type. This includes GCRODR, LSQR, RCG, and BlockGCRODR. Not
all of these solvers are enabled by default; some are still marked
"experimental."
Galeri
- Removed some instances of "using namespace std;" User code that
inadvertantly depended on symbols in std being in the global namespace may
now have errors.
Ifpack
- Removed some instances of "using namespace std;" User code that
inadvertantly depended on symbols in std being in the global namespace may
now have errors.
Ifpack2
- RILUK and Krylov may now be used as subdomain solvers in
AdditiveSchwarz.
- We made many improvements to RILUK and LocalFilter. This will move
towards fixes for a number of Ifpack2 bugs, such as 5992 and 5987.
Teuchos
- New mode for TimeMonitor::summarize (27 Mar 2014)
We added a new mode of calculating global statistics to
TimeMonitor::summarize. The new mode ignores contributions from processes
that either do not have a particular timer, or have a hard zero for a timer.
This mode is off by default, meaning that the default summarize behavior is
unchanged.
This new mode is useful in cases where not all processes have the same timers
and/or some timers are zero. This can arise when multiple MPI communicators
are in play. A single call to summarize using a global communicator yields
reasonable statistics for all timers. The cost is an additional
MPI_Allreduce.
Consider this example:
- proc 0 has timers T1=1.0, T2=0.5
- proc 1 has timers T2=1.0, T3=1.0
- proc 2 has timers T2=2.0, T3=0.5
where MCW is a communicator containing 0,1,2, and MC12 is a communicator
containing 1,2.
Calling
TimeMonitor::summarize(MCW, std::cout, false, true, false, Teuchos::Union)
yields
- min(T1)=0.0, avg(T1)=0.33, max(T1)=1.0
- min(T2)=0.5, avg(T2)=1.17, max(T2)=2.0
- min(T3)=0.0, avg(T3)=0.5, max(T3)=1.0
Calling
TimeMonitor::summarize(MC12, std::cout, false, true, false, Teuchos::Union)
yields
- min(T1)=0.0, avg(T1)=0.0, max(T1)=0.0
- min(T2)=1.0, avg(T2)=1.5, max(T2)=2.0
- min(T3)=0.5, avg(T3)=0.75, max(T3)=1.0
While each is technically correct for the communicators given, neither by
itself gives information that one might want, namely, averages over just the
processes that have a timer and mins over the nonzero times.
With the new mode, calling
TimeMonitor::summarize(MCW, std::cout, false, true, false, Teuchos::Union, "",
true) yields
- min(T1)=1.0, avg(T1)=1.0, max(T1)=1.0
- min(T2)=0.5, avg(T2)=1.17, max(T2)=2.0
- min(T3)=0.5, avg(T3)=0.75, max(T3)=1.0
- Ptr: Added is_null() method to match RCP (23 Mar 2014)
- MpiComm: Improved duplicate(), split(), and createSubcommunicator()
(27 Feb 2014).
These methods now do MPI_Comm_dup, MPI_Comm_split, resp. MPI_Comm_create, as
one would expect. They also do one less MPI_Bcast than before. This is
because messages in the new MPI_Comm (which MPI_Comm_dup, MPI_Comm_split, and
MPI_Comm_create all create) cannot collide with messages in the old MPI_Comm,
so there is no need for a broadcast to agree on a common tag.
Tpetra
- BACKWARDS IMCOMPATIBLE CHANGE: MultiVector and Vector now implement
view semantics.
This means that the copy constructor and assignment operator (operator=) of
both classes now do shallow copies. This change will support gradual porting
to the new ("Kokkos Refactor") version of Tpetra.
We have propagated this change to other Trilinos packages that use Tpetra.
Please use the new createCopy nonmember function to get a new instance of
(Multi)Vector that is a deep copy of an existing (Multi)Vector. Also, please
use the new nonmember function deep_copy to do a deep copy between two
existing compatible (Multi)Vector instances.
- Kokkos Refactor updates.
Development continues on the Kokkos Refactor version of Tpetra. This is a
partial specialization of some Tpetra classes that uses the new Kokkos
programming model. We plan eventually to switch to this version of Tpetra and
deprecate the old version.
This release adds a Kokkos Refactor version of Map. Its GID->LID and LID->GID
conversion methods are now thread-safe and thread-scalable on the host. It
also has a "device object" that you can use on CUDA devices.
The Kokkos Refactor version of MultiVector now implements "dual view"
semantics. This means that the Tpetra interface lets users mark either host
or device as modified, and synchronize between host and device on demand, if
necessary.
- Sparse matrix-matrix multiply performance improvements.
This release includes many performance improvements to Tpetra's sparse
matrix-matrix multiply routine, and other supporting routines, such as
explicit transpose, and {im,ex}portAndFillComplete. Tpetra now has a sparse
matrix-matrix multiply variant for implementing Jacobi smoothing of matrices.
This is useful for algebraic multigrid.
- CrsMatrix: "Preserve Local Graph" defaults true (17 Mar 2014)
In CrsMatrix, the undocumented parameter "Preserve Local Graph" now defaults
to true. This makes the following scenario work by default:
1. Create a CrsMatrix A that creates and owns its graph (i.e., don't
use the constructor that takes an RCP<const Tpetra::CrsGraph> or
a local graph)
2. Set an entry in the matrix A, and call fillComplete on it
3. Create a CrsMatrix B using A's graph (obtained via
A.getCrsGraph()), so that B has a const (a.k.a. "static") graph
4. Change a value in B (you can't change its structure), and call
fillComplete on B
Before this commit, the above scenario didn't work by default. This is
because A's first fillComplete call would call fillLocalGraphAndMatrix, which
by default sets the local graph to null. As a result, from that point,
A.getCrsGraph()->getLocalGraph() returns null, which makes B's fillComplete
throw an exception. The only way to make this scenario work was to set A's
"Preserve Local Graph" parameter to true. (It defaulted to false.)
The idea behind this nonintuitive behavior was for the local sparse ops object
to own all the data. This might make sense if it is a third-party library
that takes CSR's three arrays and copies them into its own storage format. In
that case, it might be a good idea to free the original three CSR arrays, in
order to avoid duplicate storage. However, resumeFill never had a way to get
that data back out of the local sparse ops object. Rather than try to
implement that, it's easier just to make "Preserve Local Graph" default to
true.
The possible data duplication mentioned in the previous paragraph can never
happen with the Kokkos Refactor version of CrsMatrix, since it insists on
controlling the matrix representation itself. This makes the code shorter and
easier to read, and also ensures efficient fill. That will in turn make the
option unnecessary.
- Many bug fixes.
- The most important bug fixed is Bug 6069, an error in Distributor, which would
only manifest on MPICH. This bug fix alone is enough reason to upgrade to
Trilinos 11.8.
PyTrilinos
- Various changes to improve the stability and robustness of the build system.
Addresses some instability in PyTrilinos introduced with new 64 bit
capabilities in Epetra. Some compilation warnings eliminated. SWIG version
checks added.
Zoltan
- Revised Scotch TPL specification in Trilinos' CMake environment to link with
all libraries needed by Scotch v6.
- Fixed bug in interface to ParMETIS v4 when multiple vertex weights are used.
- Fixed bug in interface to Scotch when some processor has no vertices.
Zoltan2
- Removed some instances of "using namespace std;" User code that
inadvertantly depended on symbols in std being in the global namespace may
now have errors.
- Simplified input Adapter classes for easier implementation by applications.
(This change may break backward compatibility for some users.)
- Some parameter names have changed or have been deleted:
pqParts --> mj_parts
parallel_part_calculation_count --> mj_concurrent_part_count
migration_check_option --> mj_migration_option
migration_imbalance_cut_off --> mj_minimum_migration_imbalance
keep_part_boxes --> mj_keep_part_boxes
recursion_depth --> mj_recursion_depth
migration_processor_assignment_type deleted.
migration_all_to_all_type deleted.
migration_doMigration_type deleted.
- Added ability to associate coordinates with matrix rows and graph vertices
through the MatrixAdapter and GraphAdapter.
- Improved the performance and readability of Multijagged Partitioning.
- Added weights to graph partitioning via Scotch.
- Changed weight specifications in input Adapters; users can no longer provide
NULL weight arrays for uniform weights.
- Added more robuts testing.
- Fixed several bugs.
###############################################################################
# #
# Trilinos Release 11.6 Release Notes #
# #
###############################################################################
Overview:
The Trilinos Project is an effort to develop algorithms and enabling
technologies within an object-oriented software framework for the solution of
large-scale, complex multi-physics engineering and scientific problems.
Packages:
The Trilinos 11.6 general release contains 54 packages: Amesos, Amesos2,
Anasazi, AztecOO, Belos, CTrilinos, Didasko, Epetra, EpetraExt, FEI,
ForTrilinos, Galeri, GlobiPack, Ifpack, Ifpack2, Intrepid, Isorropia, Kokkos,
Komplex, LOCA, Mesquite, ML, Moertel, MOOCHO, NOX, Optika, OptiPack, Pamgen,
Phalanx, Piro, Pliris, PyTrilinos, RTOp, Rythmos, Sacado, SEACAS, Shards,
ShyLU, STK, Stokhos, Stratimikos, Sundance, Teko, Teuchos, ThreadPool, Thyra,
Tpetra, TriKota, TrilinosCouplings, Trios, Triutils, Xpetra, Zoltan, Zoltan2.
Framework Release Notes:
- Changed behavior of Trilinos_ENABLE_<PACKAGE>=ON to enable all
subpackages for that package including in propogating forward dependencies.
See updated <Project>BuildQuickRef.* document.
Amesos2
- Added experimental support for Cholmod, a sparse Cholesky solver.
Epetra
- Removed a few "using std::" statements from Epetra headers. These were for
std::{string, istream, ostream, cerr, cout, endl, flush}. User code that
inadvertently relied on such names being available in global namespace could
see errors. They should explicitly use std::name or place the appropriate
"using std::name;" statement in their code.
Ifpack2
- AdditiveSchwarz interface changes
AdditiveSchwarz implements additive Schwarz domain decomposition.
The class also manages and invokes the solver for each subdomain.
That solver must implement Ifpack2::Preconditioner.
We made two changes to AdditiveSchwarz:
1. Subdomain solver type is now determined entirely at run time
2. Second template parameter (LocalInverseType) is deprecated
These changes are related to each other. It used to be that users
would specify the type of the subdomain solver as the second
template parameter of AdditiveSchwarz, LocalInverseType. The
value of LocalInverseType had to be a concrete subclass of
Preconditioner. This is no longer the case. Users now may and
should omit AdditiveSchwarz's second template parameter. More
importantly, they may now specify the subdomain solver's type at
run time. They may do so either as a run-time parameter in the
input ParameterList of AdditiveSchwarz, or by calling
AdditiveSchwarz's setInnerPreconditioner() method. See
AdditiveSchwarz's public class documentation for details.
AdditiveSchwarz's second template parameter has a default value of
Ifpack2::Preconditioner. For backwards compatibility, if users
specify a known concrete subclass of Ifpack2::Preconditioner for
LocalInverseType, AdditiveSchwarz will implement the previous
behavior of creating a subdomain solver of that specific type. In
the next major Trilinos release, we plan to remove
AdditiveSchwarz's second template parameter entirely.
Kokkos
- Non-backwards compatible change: In the "Kokkos Classic" subpackage
(everything in kokkos/classic), the "Kokkos" namespace has been
changed to "KokkosClassic".
This will facilitate coexistence of Kokkos Classic with both the
new Kokkos programming model and the new Kokkos subpackages that
depend on it. Coexistence will be necessary for our planned port
of Tpetra to use new Kokkos instead of Kokkos Classic. Kokkos
Classic will eventually be deprecated, and Tpetra (and downstream
packages) will use new Kokkos instead.
NOX
- Added a Fixed-point Anderson Acceleration solver. Unit tests exist for
Epetra and Thyra adapters.
RTOp
- Added better runtime support (not dependent on debug-mode builds) for
printing the application of RTOps in parallel with the
RTOpPack::SPMD_apply_op() functions. This makes parallel debugging much
easier (for example, involving Thyra).
Thyra
- Refactored Thyra support software and Thyra/Epetra adapters to support
zero-element processes for vector spaces and maps. Now
Thyra::SpmdVectorSpaceBase subclass object can have zero elements on a
process and have everything work as it should. The Thyra/Epetra and
Thyra/Tpetra adapters should also be able to take Eptra and Tpetra Map
objects that have zero elements on a process as well. For most clients and
subclasses, these refactorings should maintain 100% perfect backward
compatibility except now more use cases are supported than before. See the
unit tests and updated class documentation for details.
Tpetra
- Gradual port to use (new) Kokkos
Tpetra will migrate to use the new Kokkos programming model. The
tpetra/src/kokkos_refactor directory contains a preview of this
migration under development. This will include
backwards-incompatible changes. For example, MultiVector and
Vector will have view semantics, instead of their current
container semantics. This means that their copy constructor and
assignment operator (operator=) will make shallow copies, instead
of deep copies. This will make Tpetra's semantics more consistent
with those of Kokkos. In order to provide deep copies, all Tpetra
objects will get the following:
- createCopy() method: returns a deep copy of its *this argument
- deep_copy() nonmember function: copies the contents of one
MultiVector into the contents of another existing MultiVector.
This works like deep_copy() for Kokkos::View objects.
MultiVector already has both of these functions. Thus, in order
to prepare for the backwards incompatible changes to Tpetra, users
must find all uses of the copy constructor and assignment
operator, and replace them with createCopy() resp. deep_copy().
This will affect at least the following packages which have
generic adapters for Tpetra::MultiVector:
- Amesos2 (MultiVecAdapter)
- Anasazi (MultiVecTraits)
- Belos (MultiVecTraits)
- Xpetra (Xpetra::TpetraMultiVector)
- Accepted non-backwards compatible change to KokkosClassic, in which
that subpackage changed its namespace from Kokkos to KokkosClassic.
###############################################################################
# #
# Trilinos Release 11.4 Release Notes #
# #
###############################################################################
Overview:
The Trilinos Project is an effort to develop algorithms and enabling
technologies within an object-oriented software framework for the solution of
large-scale, complex multi-physics engineering and scientific problems.
Packages:
The Trilinos 11.4 general release contains 54 packages: Amesos, Amesos2,
Anasazi, AztecOO, Belos, CTrilinos, Didasko, Epetra, EpetraExt, FEI,
ForTrilinos, Galeri, GlobiPack, Ifpack, Ifpack2, Intrepid, Isorropia, Kokkos,
Komplex, LOCA, Mesquite, ML, Moertel, MOOCHO, NOX, Optika, OptiPack, Pamgen,
Phalanx, Piro, Pliris, PyTrilinos, RTOp, Rythmos, Sacado, SEACAS, Shards,
ShyLU, STK, Stokhos, Stratimikos, Sundance, Teko, Teuchos, ThreadPool, Thyra,
Tpetra, TriKota, TrilinosCouplings, Trios, Triutils, Xpetra, Zoltan, Zoltan2.
Framework Release Notes:
- The following packages have been switched to BSD-compatible licenses:
Didasko, Ifpack, Ifpack2, Moertel, Stokhos, Stratimikos
ForTrilinos
- This release includes 11 modules or classes of the Epetra package.
- This package is still in its experimental stage and is only supported on AIX.
- Sample configure script are provided in
Trilinos/sampleScripts/aix-fortrilinos-serial and
Trilinos/sampleScripts/aix-fortrilinos-mpif90 for serial and mpi builds
respectively.
- Because of the object-oriented features used, it requires a XL Fortran
compiler v13.1. The source code can be compiled using the xlf compiler
option.
- Required compiler flags for Fortran include:
-qfixed=72 -qxlines: deals with older Fortran source code in other
Trilinos packages. These flags are used for mpi
builds and must be specified in the configure
script.
-qxlf2003=polymorphic: allows for the use of polymorphism in the source
code.
-qxlf2003=autorealloc: allows the compiler to automatically reallocate the
left hand side with the shape of the right hand side
when using allocatable variables in an assignment.
-qfree=f90: informs the compiler that the source code is free
form and conforms to Fortran 90.
These flags(-qfree=f90 -qxlf2003=polymorphic -qxlf2003=autorealloc) are
hardcoded in Trilinos/packages/ForTrilinos/CMakeLists.txt
- Required compiler flag for xlc++ include:
-qrtti=all: this flag should be included in the configure
script.
- The project is primarily user-driven; so new interfaces are developed at the
request of Trilinos users.
Ifpack2
- Relaxation: Use precomputed offsets to extract diagonal
As of this release, Tpetra::CrsMatrix has the ability to to precompute
offsets of diagonal entries, and use them to accelerate extracting a
copy of the diagonal. Relaxation now exploits this feature to speed up
compute() (which extracts a copy of the diagonal of the input matrix).
The optimization only occurs if the input matrix is a CrsMatrix (not
just a RowMatrix) and if it has a const ("static") graph. The latter
is necessary so that we know that the structure can't change between
calls to compute(). (Otherwise we would have to recompute the offsets
each time, which would be no more efficient than what it was doing
before.)
Kokkos
- Non-backwards compatible change: Default Kokkos/Tpetra Node type is now
Kokkos::SerialNode
User expectation seems to be that the default behavior of Tpetra
is MPI-only. These users are therefore experiencing unexpected
performance when the default node is threaded, as is currently the
case if any of the threading libraries (pthreads, TBB, OpenMP) are
enabled. Therefore, after some discussion among Kokkos/Tpetra
developers, it was decided to change the default Kokkos node (and
therefore, the default node used by Tpetra objects) to
Kokkos::SerialNode. This can be overridden at configure time by
specifying the following option to CMake when configuring
Trilinos:
-D KokkosClassic_DefaultNode:STRING="node_type"
where node_type is one of the official Kokkos nodes:
Kokkos::SerialNode (current default)
Kokkos::TBBNode
Kokkos::TPINode
Kokkos::OpenMPNode
Mesquite
- Added polygon support to allow reading and writing of vtk files containing
polygons and smoothing of meshes containing polygons using the Laplacian
smoother.
- Rewrote ShapeImprover wrapper determine if mesh to be optimized is
tangled or not. If tangled, wrapper now uses a non-barrier metric and
if not tangled, a barrier metric is used.
- Created a new directory structure underneath meshFiles/3D/vtk and
meshFiles/2D/vtk that arranges the mesh files into subdirectories
based on element type and whether they are tangled or untangled.
- Created new class MeshDomainAssoc to formally associate a Mesh instance
with a Domain instance to verify that the mesh and domain are compatible.
- Productionized the NonGradient solver.
- Added new classes TMetricBarrier and TMetricNonBarrier to TMetric class to
provide a clear division between the barrier and non-barrier target metric
classes.
- Added new classes AWMetricBarrier and AWMetricNonBarrier to AWMetric class
for same reason as the TMetric classes.
- Added a new error code "BARRIER_VIOLATED" to the MsgError class that is
issued when a barrier violation is encountered when using a barrier target
metric class.
- Added warning when MaxTemplate is used with any solver other than
NonGradient.
- Made a number of changes to the Quality Summary output to improve
readability and provide additional information.
PyTrilinos
- Updated the NumPy interface to properly deal with deprecated
code. If PyTrilinos if compiled an older NumPy, it still works,
but if compiled against newer versions of NumPy, the deprecated
code is avoided, as are the warnings.
Teuchos
- Added optional automatic global reductions of pass/fail to Teuchos Unit
Test Harness: Prior to this feature addition, only the result on the root
process of a parallel unit test would determine pass/fail, even if tests on
other proesses failed. This makes it easier to write parallel unit tests
and results in more robust test code. For a discussion, see Trilinos issue
#5909. An example can be found in
teuchos/comm/test/UnitTesting/UnitTestHarness_Parallel_UnitTests.cpp (see
the CMakeLists.txt file for how that test is run). NOTE: By default, no
global reductions of pass/fail are done as to maintain perfect backward
compatibility.
- Added new feature to TimeMonitor: You may now enable or disable a timer
(instance of Time) by name. Disabled timers ignore start() and stop()
calls; calling these methods on a disabled timer does not change its elapsed
time or call count. Thus, TimeMonitor's constructor and destructor have no
effect on disabled timers. However, the disabled timers still exist, and
TimeMonitor's summarize() and report() class methods will print statistics
for disabled timers (using their elapsed times and call counts while
enabled). Enabling a timer does not reset its elapsed time or call count.
This feature is useful if you want to time only certain invocations of a
particular function that has an internal timer, without modifying the
function's source code. For an example, see
packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp, line 175
("TimeMonitor, enableTimer" unit test).
Thyra
- Fixed explicit template instantation system in the generation of
Thyra_XXX.hpp files to *not* include Thyra_XXX_def.hpp when explicit
instantation is turned on. The refactoring of Thyra to use subpackages some
time ago broke the generation of Thyra_XXX.hpp files in that they were
always including Thyra_XXX_def.hpp files. That was bad because it increased
compile time for client code and allowed other includes to get pulled in
silently. Now client code that includes Thyra_XXX.hpp when explicit
instantiation is turned on will will *not* get the include of
Thyra_XXX_def.hpp. This might break some downstream client code that was
not properly including the necessary header files and was accidentally
getting them from the Thyra_XXX_def.hpp files that were being silently
included. However, this technically does not break backward compatibility
since client code should have been including the right headers all along.
For example, when GCC cleaned up their standard C++ header files this
required existing C++ code to add a bunch of missing includes that should
have been there the whole time.
Tpetra
- Performance improvements to fillComplete (CrsGraph and CrsMatrix)
- Performance improvements to Map's global-to-local index conversions
- MPI performance optimizations
Methods that perform communication between (MPI) processes do less
communication than before. This should improve performance,
especially for large process counts, of the following operations:
- Creating a Map
- Creating an Import or Export communication plan
- Executing an Import or Export (e.g., in a distributed sparse
matrix-vector multiply, or in global finite element assembly)
- Calling fillComplete() on a CrsGraph or CrsMatrix
- Restrict a Map's communicator to processes with nonzero elements,
and apply the result to a distributed object
Map now has two new methods. The first, removeEmptyProcesses(),
returns a new Map with a new communicator, which contains only those
processes which have a nonzero number of entries in the original Map.
The second method, replaceCommWithSubset(), returns a new Map whose
communicator is an arbitrary subset of processes of the original Map's
communicator. Distributed objects (subclasses of DistObject) also
have a new removeEmptyProcessesInPlace() method, for applying in place
the new Map created by calling removeEmptyProcesses() on the original
Map over which the object was distributed.
These methods are especially useful for algebraic multigrid. At
coarser levels of the multigrid hierarchy, it is helpful for
performance to "rebalance" the matrices at those levels, so that a
subset of processes share the elements. This leaves the remaining
processes without any elements. Excluding them from the communicator
reduces the cost of all-reduces and other communication operations
necessary for creating the coarser levels of the hierarchy.
- CrsMatrix: Native SOR and Gauss-Seidel kernels
These kernels improve the performance of Ifpack2 and MueLu.
Gauss-Seidel is a special case of SOR (Symmetric Over-Relaxation).
See the documentation of Ifpack2::Relaxation for details on the
algorithm, which is actually a "hybrid" of Jacobi between MPI
processes, and SOR (or Gauss-Seidel) within an MPI process. The
kernels also include the "symmetric" variant (forward and backward
sweeps) of SOR and Gauss-Seidel.
- CrsMatrix: Precompute and reuse offsets of diagonal entries
The (existing) one-argument verison of CrsMatrix's getLocalDiagCopy()
method requires the following operations per row:
1. Convert current local row index to global, using the row Map
2. Convert global index to local column index, using the column Map
3. Search the row for that local column index
Precomputing the offsets of diagonal entries and reusing them skips
all these steps. CrsMatrix has a new method getLocalDiagOffsets() to
precompute the offsets, and a two-argument version of
getLocalDiagCopy() that uses the precomputed offsets. The precomputed
offsets are not meant to be used in any way other than to be given to
the two-argument version of getLocalDiagCopy(). They must be
recomputed whenever the structure of the sparse matrix changes (by
calling insertGlobalValues() or insertLocalValues()) or is optimized
(e.g., by calling fillComplete() for the first time).
- CrsGraph,CrsMatrix: Added "No Nonlocal Changes" parameter to
fillComplete()
The fillComplete() method accepts an optional ParameterList which
controls the behavior of fillComplete(), as opposed to behavior of the
object in general. "No Nonlocal Changes" is a bool parameter which is
false by default. Its value must be the same on all processes in the
graph or matrix's communicator. If the parameter is true, the caller
asserts that no entries were inserted in nonowned rows. This lets
fillComplete() skip the global communication that checks whether any
processes inserted any entries in nonowned rows.
- Default Kokkos/Tpetra Node type is now Kokkos::SerialNode
NOTE: This change breaks backwards compatibility.
Users expect that Tpetra by default uses "MPI only" for parallelism,
rather than "MPI plus threads." These users were therefore
experiencing unexpected performance issues when the default Kokkos
Node type is threaded, as was the case if Trilinos' support for any of
the threading libraries (Pthreads, TBB, OpenMP) are enabled. Trilinos
detects and enables support for Pthreads automatically on many
platforms. Therefore, after some discussion among Kokkos and Tpetra
developers, we decided to change the default Kokkos Node type (and
therefore, the default Node used by Tpetra objects) to
Kokkos::SerialNode. This can be overridden at configure time by
specifying the following option to CMake when configuring Trilinos:
-D KokkosClassic_DefaultNode:STRING="<node-type>"
where <node-type> any of the official Kokkos Node types, such as the
following:
- Kokkos::SerialNode (current default)
- Kokkos::TBBNode
- Kokkos::TPINode
- Kokkos::OpenMPNode
###############################################################################
# #
# Trilinos Release 11.2 Release Notes #
# #
###############################################################################
Overview:
The Trilinos Project is an effort to develop algorithms and enabling
technologies within an object-oriented software framework for the solution of
large-scale, complex multi-physics engineering and scientific problems.
Packages:
The Trilinos 11.2 general release contains 54 packages: Amesos, Amesos2,
Anasazi, AztecOO, Belos, CTrilinos, Didasko, Epetra, EpetraExt, FEI,
ForTrilinos, Galeri, GlobiPack, Ifpack, Ifpack2, Intrepid, Isorropia, Kokkos,
Komplex, LOCA, Mesquite, ML, Moertel, MOOCHO, NOX, Optika, OptiPack, Pamgen,
Phalanx, Piro, Pliris, PyTrilinos, RTOp, Rythmos, Sacado, SEACAS, Shards,
ShyLU, STK, Stokhos, Stratimikos, Sundance, Teko, Teuchos, ThreadPool, Thyra,
Tpetra, TriKota, TrilinosCouplings, Trios, Triutils, Xpetra, Zoltan, Zoltan2.
AztecOO
- Added support for 64-bit global indices. They can be used if 64-bit based
Epetra maps are used.
Epetra
- IndexBase argument for 64-bit maps is now a "long long". No change to
32-bit maps, where it remains an "int".
NOX
- Added example of user defined preconditioner with a JFNK forward operator to
the thyra support.
- Removed all usage of EpetraExt::ModelEvaluator in favor of a direct
inhertance from the Thyra::ModelEvaluator. The EpetraExt::ModelEvaluator is
being deprected.
- Added support for the Thyra Group to accept user defined preconditioners and
Jacobian operators.
- Merged the object code for the library noxthyra into the main nox library to
work around a circular dependency for the pseudo-transient solver.
- Added a pseudo-transient solver based on Thyra objects. Still under
development.
PyTrilinos
- General
- Added STK as an optional dependency of PyTrilinos
- Added Pliris as a supported package
- Provide better compatibility with external MPI implementations.
Specifically, if the user were to "import mpi4py" (for example)
prior to importing Teuchos or Epetra, then the Teuchos or Epetra
modules will not take responsibility for calling MPI_Finalize().
- Fixed some build errors
- Epetra module
- Priliminary support for Epetra64. Ultimately, I would like the
default behavior to be using 64-bit methods without refering to
64-bit method names.
- Added PyTrilnos.Epetra.FECrsMatrix InsertGlobalValues method that
had been hidden by a %extend SWIG directive.
- EpetraExt module
- Gave names to EpetraExt template classes. Using the nameless
versions had caused problems with newer versions of SWIG. This
should get rid of the need for a patch distributed with
Archlinux.
- Added the EpetraExt::CrsMatrix_SubCopy class to
PyTrilinos.EpetraExt.
- NOX module
- Improved NOX support, especially NOX.Epetra. This should be
largely invisible to the user, but I used to have to always import
NOX whether the user wanted it or not, due to nested namespace
issues. These issues have been resolved now, and you only import
NOX if you specifically request it.
- Anasazi module
- Added EpetraMultiVecAccessor base class. Anasazi added this base
class, and now the PyTrilinos version supports it as well.
TriUtils
- Added support for 64-bit global indices. They can be used if 64-bit based
Epetra maps are used or equivalent TriUtils functions with suffix "64" are
called.
###############################################################################
# #
# Trilinos Release 11.0 Release Notes #
# #
###############################################################################
Overview:
The Trilinos Project is an effort to develop algorithms and enabling
technologies within an object-oriented software framework for the solution of
large-scale, complex multi-physics engineering and scientific problems.
Packages:
The Trilinos 11.0 general release contains 54 packages: Amesos, Amesos2,
Anasazi, AztecOO, Belos, CTrilinos, Didasko, Epetra, EpetraExt, FEI,
ForTrilinos, Galeri, GlobiPack, Ifpack, Ifpack2, Intrepid, Isorropia, Kokkos,
Komplex, LOCA, Mesquite, ML, Moertel, MOOCHO, NOX, Optika, OptiPack, Pamgen,
Phalanx, Piro, Pliris, PyTrilinos, RTOp, Rythmos, Sacado, SEACAS, Shards,
ShyLU*, STK, Stokhos, Stratimikos, Sundance, Teko, Teuchos, ThreadPool, Thyra,
Tpetra, TriKota, TrilinosCouplings, Trios, Triutils, Xpetra*, Zoltan, Zoltan2*.
(* denotes package is being released externally as a part of Trilinos for the
first time.)
Framework Release Notes:
Transitioning due to removal of deprecated code:
---------------------------------------------------------------
With the update from Trilinos 10.12 to 11.0, several deprecated classes,
function, macros, and files have been removed from the Trilinos 11.0 sources.
If a client code was using these deprecated features and upgrades to 11.0, the
client code will no longer build. To ease the transition of client code to
Trilinos 11.0, the following procedure is recommended:
1) Get the release tarball for Trilinos 10.12.
2) Do a build from scratch of the client application code against Trilinos
10.12 (making sure that deprecated warnings are enabled). Save the full build
output to a file to be searched for deprecated warnings.
3) Search the build output of the client code for "deprecated" warnings
which give file names and line numbers for deprecated features. For each
deprecated warning or feature:
3.a) Look at the Trilinos source code referenced in the "deprecated"
warning and see why the feature (or file) was deprecated and what instructions
there are for using an alternant implementation. In many cases, the
deprecated function or macro will be calling a non-deprecated variation.
3.b) Change the client source code in smaller reasonable sized chucks (looping
back to step #3.a multiple times) to removed deprecated usage and re-build and
re-test incrementally to ensure the client code continues to build and run
correctly.
4) Once all deprecated code has been addressed through various iterations in
step #3, go back to step #2 to make sure that no deprecated warnings are
found.
5) Rebuild the client code against Trilinos 11.0.
For most deprecated features, the above algorithm will cleanly and safely
facilitate the upgrade of client code to Trilinos 11.0. If, however, the
client code fails to build against Trilinos 11.0 in step #5 after removing all
deprecated warnings against 10.12, then a more difficult and risky upgrade
process may be necessary. First, consult the release notes and tests and
examples for the Trilinos package causing the failures. If that is not
helpful, email [email protected] for advice.
To see a discussion of the why and how of the management of deprecated code in
Trilinos, see Section 6.5 "Regulated Backward Compatibility: Details" in
the TriBITS Lifecycle Model technical report:
http://www.ornl.gov/~8vt/TribitsLifecycleModel_v1.0.pdf
Sorry for any inconvenience this transition to Trilinos 11.0 may cause due to
the removal of deprecated features and code.
===============================================================================
Package Release Notes:
-------------------------------------------------------------------------------
Amesos2
- Added support for Pardiso-MKL (multithreaded solver)
- Several bug fixes
Epetra
Added support for 64-bit global indices
Epetra supports 64-bit global indices beginning with Trilinos Release 11.0
by using the "long long" datatype. Epetra still supports 32-bit global
indices and the interface for using them remains the same.
- To construct Epetra objects for 64-bit indices, certain input arguments
must be "long long" instead of "int". For example, compare
32: Epetra_BlockMap(int NumGlobalElements, int NumMyElements,
const int *MyGlobalElements, ...)
64: Epetra_BlockMap(long long NumGlobalElements, int NumMyElements,
const long long *MyGlobalElements, ...)
- New member functions that return a long long value have a suffix "64". For
example, GID64, NumGlobalNonzeros64, MaxAllGID64, etc. These functions work
whether the underlying object is 32-bit or 64-bit based. The older
non-suffixed functions work for 32-bit objects only.
- New classes added for "long long" data: Epetra_LongLongVector,
Epetra_LongLongSerialDenseVector, Epetra_LongLongSerialDenseMatrix.
- To build Epetra and dependent packages without any 64-bit support turn on
the CMake flag Trilinos_NO_64BIT_GLOBAL_INDICES. Default is off.
- To enforce that a code is truly compatible with 64-bit Epetra
turn on the CMake flag Trilinos_NO_32BIT_GLOBAL_INDICES, and fix any
compile-time or run-time errors. Default is off.
Epetra 64-bit support FAQ (compile problem)
Epetra-dependent code may not compile if it relied on automatic type
conversion to "int" from non-int types when constructing Epetra objects or
calling certain member functions. This is because now there can be ambiguity
due to overloading. Use explicit conversion to either "int" or "long long".
Kokkos
- Non-backwards compatible change: row pointers for CRS objects are not longer
size_t; instead, they are the same Ordinal type as the columns indices
- Non-backwards compatible change: construction of Kokkos local graph objects
requires specifying number of columns
- KokkosArray
- Initial release of experimental package for manycore performance-portable
kernels using multidimensional array API to transparently swap between
"array of structures" and "structure of arrays" as per the manycore device
needs.
- Proxy-application examples include hybrid parallel (MPI + KokkosArray)
nonlinear thermal conduction finite elements and explicit dynamics finite
elements. These have been tested with pthreads and Cuda on the Cray XK6.
RTOp
- Dropped deprecated code in Trilinos 10.12 (see general release notes on
dropping deprecated code).
ShyLU
- Initial public release for ShyLU. ShyLU is a hybrid direct-iterative
preconditioner (solver) for general sparse linear systems, based on Schur
complement approximation. It uses a hybrid MPI+threads parallel execution
model.
- Should be used as Ifpack preconditioner (for now).
- ShyLU should be considered *EXPERIMENTAL* code.
Teuchos
- Dropped deprecated code in Trilinos 10.12 (see general release notes on
dropping deprecated code).
- Teuchos reference BLAS implementations have been corrected to mimic the
behavior of their machine-specific counterparts. See bugs 4262 and 5683.
This includes fixing the interface to _GER so that the complex instantiation
of that routine uses _GERU. Also, _ASUM and _IAMAX were corrected to
perform the correct calculations for complex-valued data types.
- Fixed SerialDenseSolver class to correctly handle complex-valued data
types. See bug 5308.
- CommandLineProcessor now properly throws exceptions. See bug 4668 and
5387. By default this tool throws exceptions and must recognize all the
options it encounters on the command line. This is enforced through the
implementation now. If exception throwing is disabled, then proper error
codes will be returned to the user.
- Filtering timer labels for global statistics and output
The computeGlobalTimerStatistics(), report(), and summarize() class
methods of TimeMonitor now support "filtering" timer labels. See Bug
5301:
https://software.sandia.gov/bugzilla/show_bug.cgi?id=5301
Both methods take an optional "filter" string. If nonempty, the
methods only print timers whose labels begin with that string.
This feature could be used to implement "namespaces" for timers.
Trilinos packages may take advantage of this feature by prefixing the
timer name with the package name. For example: "Teuchos: Timer 1".
Users may exploit this feature to reduce the volume of output. The
implementation does not compute global statistics for timers that are
filtered out, so filtering could also reduce computation and
communication.
- YAML output option for timing results
The report() class method of TimeMonitor now has a YAML output option.
See Bug 5302:
https://software.sandia.gov/bugzilla/show_bug.cgi?id=5302