- Add Audio as a schema domain.
- Relax dependency on Protobuf to include version 5.x
- N/A
- For nested features with N nested levels (N > 1), the statistics counting
the number of values in
CommonStatistics
andWeightedCommonStatistics
will rely on the innermost level.
- N/A
- N/A
- N/A
- Bump the Ubuntu version on which TFMD is tested to 20.04 (previously was 16.04).
- Bumped the minimum bazel version required to build
tfmd
to 6.1.0. - Depends on
protobuf>=4.25.2,<5
for Python 3.11 and onprotobuf>3.20.3,<4.21
for 3.9 and 3.10. - Depends on
googleapis-common-protos>=1.56.4,<2
for Python 3.11 and ongoogleapis-common-protos>=1.52.0,<2
for 3.9 and 3.10. - Relax dependency on
absl-py
to include version 2.
- Removed
NaturalLanguageDomain.location_constraint_regex
. It was documented as "please do not use" and never implemented. - Change to the semantics of min/max/avg/tot num-values for nested features (see above).
- Deprecated Python 3.8 support.
- N/A
- Add
joint_group
toSequenceMetadata
to specify which group this sequence feature belongs to so that they can be modeled jointly. - Add
BOOL_TYPE_INVALID_CONFIG
anomaly type. - Add
embedding_dim
toFloatDomain
to specify the embedding dimension, which is useful for use cases such as restoring shapes for flattened sequence of embeddings. - Add
sequence_truncation_limit
toSequenceMetadata
to specify the maximum sequence length that should be processed. - Depends on
protobuf>=3.20.3,<4.21
. Upper bound is required to avoid breaking changes. - Add
embedding_type
toFloatDomain
to specify the semantic type of the embedding. This is useful for use cases where the embedding dimension is inferred from the embedding type.
- N/A
- N/A
- N/A
- Depends on
protobuf>=3.20.3,<5
.
- N/A
- N/A
- Introduce
Schema.represent_variable_length_as_ragged
knob to automatically generateRaggedTensor
s for variable length features. - Introduces a Schema option
HistogramSelection
to allow numeric drift/skew calculations to use QUANTILES histograms, which are more robust to outliers.
- N/A
- N/A
- Deprecated Python 3.7 support.
- N/A
- N/A
- N/A
- N/A
- N/A
- Add a categorical indicator to the schema for
StringDomain
. - Add ProblemStatement Task.is_auxiliary field to allow specifying auxiliary tasks in multi-task learning problems.
- Add the SequenceMetadata field to the schema to specify if this feature could be treated as a sequence feature.
- Add a
CUSTOM_VALIDATION
Type in anomalies.proto.
- Histogram Buckets include their upper bound instead of their lower bound.
- N/A
- N/A
- ThresholdConfig.threshold field is made into a oneof.
- Clarifies the meaning of num_non_missing in statistics.proto.
- N/A
- ProblemStatement Task.task_weight and MetaOptimizationTarget.weight are deprecated.
- N/A
- N/A
- N/A
- N/A
- N/A
- Adds experimental support within statistics.proto and schema.proto for marking features that are derived during statistics generation for data exploration or validation, but not actually present in input data.
- Adds an experimental DERIVED_FEATURE_BAD_LIFECYCLE and DERIVED_FEATURE_INVALID_SOURCE anomaly type.
- N/A
- N/A
- N/A
- N/A
- N/A
- N/A
- N/A
- statistics.proto: Includes a field
invalid_utf8_count
inStringStatistics
to store the number of non-utf8 encoded strings for a feature. - Depends on
absl-py>=0.9,<2.0.0
.
- Removes deprecated field
objective_function
from ProblemStatement.
- Deprecates
multi_objective
field in ProblemStatement. - Deprecates several unused PerformanceMetrics.
- N/A
- A
threshold_config
is added to MetaOptimizationTarget to allow for expressing thresholded optimization goals.
- N/A
- N/A
- N/A
- Added a new field to
FloatDomain
in schema to allow expression of categorical floats.
- N/A
- Deprecated Python 3.6 support.
- To maintain version consistency among TFX Family libraries we skipped the 1.3.x release for TFMD library.
- Added
PositiveNegativeSpec
toProblemStatement.BinaryClassification
for specifying positive and negative class values.
- N/A
- N/A
- N/A
- N/A
- Depends on
protobuf>=3.13,<4
.
- N/A
- N/A
- Added public python interface for proto/* in proto/init.py
- N/A
- N/A
- N/A
- N/A
- Added new anomaly types:
MULTIPLE_REASONS
andINVALID_DOMAIN_SPECIFICATION
. - Added new anomaly type:
STATS_NOT_AVAILABLE
.
- N/A
- N/A
- Adding the ability to specify and detect sequence length issues.
- Depends on
absl-py>=0.9,<0.13
.
- N/A
- N/A
- Added new anomaly type
MAX_IMAGE_BYTE_SIZE_EXCEEDED
for image_domain. - Added new anomaly type
INVALID_FEATURE_SHAPE
. - The
RaggedTensor
TensorRepresentation now supports additional partitions.
- N/A
- N/A
- N/A
- Added new anomaly types to AnamalyInfo to report data issues with NL features.
- Added new FloatDomain field and anomaly type to designate and validate features that represent fixed dimensional embeddings.
- N/A
- N/A
- Added new fields to NaturalLanguageDomain message in the schema, including support for specifying vocabularies, constraints on sequence values (SequenceValueConstraints), constraints on vocabulary coverage (FeatureCoverageConstraints), and constraints on token location (location_constraints_regex).
- Added new NaturalLanguageStatistics message to the statistics.proto so that we can compute statistics corresponding to Natural Language features.
- N/A
- N/A
- N/A
-
Added new Anomaly and Schema field to support drift and distribution skew detection for numeric features.
-
Added a new field in Anomalies proto to report the raw measurements of distribution skew detection.
-
From this release TFMD will also be hosting nightly packages on https://pypi-nightly.tensorflow.org. To install the nightly package use the following command:
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple tensorflow-metadata
Note: These nightly packages are unstable and breakages are likely to happen. The fix could often take a week or more depending on the complexity involved for the wheels to be available on the PyPI cloud service. You can always use the stable version of TFMD available on PyPI by running the command
pip install tensorflow-metadata
.
- Added new Anomaly type to describe when a domain is incompatible with the data type.
- Added new Anomaly types for invalid schema configurations (missing name, missing type, etc).
- Added new Anomaly type to describe when type does not match the data.
- Added new LifecycleStage:DISABLED.
- N/A
- N/A
- From this version we will be releasing python 3.8 wheels.
- When installing from source, you don't need any steps other than
pip install
(needs Bazel). - Labels can be specified as Paths in addition to string names.
- Depends on
absl-py>=0.9,<0.11
. - Depends on
googleapis-common-protos>=1.52.0,<2
.
- N/A
- Deprecated Python 3.5 support.
- Added disallow_inf to FloatDomain message in schema.proto.
- Added new Anomaly type to describe data that has unexpected Infs / -Infs.
- Added new Anomaly and Schema field for specifying ratio of supported images.
- Added value_counts field to Feature message in schema.proto, which describes the number of values for features that have more than one nestedness level.
- Added new anomaly type VALUE_NESTEDNESS_MISMATCH to describe data that has a nestedness level that does not match the schema.
- Added new Any type value to CustomStatistic.
- Add ProblemStatement and Metric Python proto stubs.
- Use absltest instead of unittest.
- N/A
- Drops Python 2 support.
- Note: We plan to remove Python 3.5 support after this release.
- Added UniqueConstraints to Feature message in schema.proto.
- Added new Anomaly types to describe data that does not conform to UniqueConstraints.
- Added PresenceAndValencyStatistics to CommonStatistics.
- Added RaggedTensor in TensorRepresentation
- Added a new type of Anomaly: DATASET_HIGH_NUM_EXAMPLES
- Added a new field to dataset_constraints: max_examples_count
- Added a multi-label TaskType.
- Removed ProblemStatementNamespace proto
- Removed ProblemStatementReference proto
- Removed field ProblemStatement.implements
- Fixed a compatibility issue with newer bazel versions.
- Started pulling TF 1.15.2 source for building.
- Added support for specifying behavior of rare / OOV multiclass labels.
- Added anomaly types related to weighted features.
- Added support for storing lift stats on weighted examples.
- The removal of
lift_series
fromCategoricalCrossStats
and the change of type ofLiftSeries.LiftValue.lift
from float to double will cause parsing failures for serialized protos written written by version 0.21.0 which contained the deleted or changed fields.
- Added protos for categorical cross statistics using lift.
- Added a new type of Anomaly: FLOAT_TYPE_HAS_NAN
- Added a new field to float_domain: disallow_nans
- Added SparseTensor to TensorRepresentation.
- Added a new type of Anomaly
- Add WeightedFeature to schema.
- Add min_examples_count to DatasetConstraints and DATASET_LOW_NUM_EXAMPLES anomaly type.
- Add TimeOfDay domain and UNIX_DAY granularity for TimeDomain in schema.
- Added TensorRepresentation to schema.
No significant changes. Upgrading to keep version alignment.
- Adding CustomMetric to PerformanceMetric.
- Added an Any field to Schema Feature, for storing arbitrary structured data.
- Refactoring ProblemStatement and related protos. At present, these are not stable.
- Added ProblemStatement.
- Add support for declaring sparse features.
- Add support for schema diff regions.
- Adding functionality for handling structured data.
- StructStatistics.common_statistics changed to StructStatistics.common_stats to agree with Facets.
- The change from StructStatistics.common_statistics to StructStatistics.common_stats may break code that had this field set and was serializing to some text format. The wire format should be fine.
- Use the same version of protobuf as tensorflow.
- Added support for structural statistics.
- Added new error types.
- Removed DiffRegion.
- added RankHistogram to CustomStatistics.
- Removed DiffRegion.
- Established tf.Metadata as a standalone package.
- Moved tf.Metadata code out of TF-Transform code tree, requiring package dependency updates and import updates.