Skip to main content

Aggregate Settings

AtScale supports the following global-level settings for aggregates.

aggregate.batch.cube.gracePeriodOverrides.enabled

  • Default: false
  • Restart: No

Allow specifying grace period overrides when building an aggregate batch for a model. For more information, see Rebuilding Aggregates Using the REST API.

Important

Enabling this functionality can potentially cause your system to become strained by expensive aggregate rebuilds.

aggregate.batch.gracePeriodOverrides.enabled

  • Default: false
  • Restart: No

Allow specifying grace period overrides when building an aggregate batch. Requires aggregate.batch.cube.gracePeriodOverrides.enabled to be enabled on the models you want to run incremental builds for. For more information, see Rebuilding Aggregates Using the REST API.

Important

Enabling this functionality can potentially cause your system to become strained by expensive aggregate rebuilds.

aggregates.batch.max.failures

  • Default: 0
  • Restart: No

Max number of failures for a batch build, before the whole batch fails.

aggregates.batch.retry.maxAttemptsPerAggregate

  • Default: 3
  • Restart: No

The maximum number of reattempts to build a single aggregate during a single batch build. This number cannot exceed the value of AGGREGATES.BATCH.RETRY.MAXATTEMPTSPERBATCH.

aggregates.batch.retry.maxAttemptsPerBatch

  • Default: 5
  • Restart: No

The maximum number of reattempts to build aggregates during a single batch build.

The following example illustrates how this setting interacts with aggregates.batch.retry.maxAttemptsPerAggregate:

Step in the build processRestarts for Aggregate ARestarts for Aggregate BRestarts for Aggregate CTotal restarts for the batch
A batch build of the aggregates in a model starts.0000
The build for Agg A fails and restarts.1001
The build for Agg A again fails and restarts.2002
The build for Agg B fails and restarts.2103
The build for Agg C fails and restarts.2114
The build for Agg C again fails and restarts.2125
The build for Agg C again. The batch build fails as a whole because the max number of retries has been reached.2115

Moreover, if the value of AGGREGATES.BATCH.RETRY.MAXATTEMPTSPERAGGREGATE is reached before the value of AGGREGATES.BATCH.RETRY.MAXATTEMPTSPERBATCH during a batch aggregate build, the build fails and ends.

Whenever a new build starts, the counters for both settings are reset to 0. To use the example in the table above, when a new batch build starts, the counter for the number of restarts for each aggregate table is set to 0. The counter for the total number of restarts for the batch is also reset to 0.

aggregates.create.buildFromExisting

  • Default: true
  • Restart: No

Enables or disables all features relating to building aggregates from other aggregates, as opposed to base tables.

The AtScale engine continuously assesses the quality of the aggregate-table definitions that it has generated. If it determines that a new definition is needed, by default the first instance of that definition is built from a query against raw data, even if that definition is based on a current aggregate-table definition.

Use this setting to allow the first instance of a new definition to be built from the data that is already in an instance of another definition. Allowing the first instance to be built in this way speeds up the build process.

For example, suppose that the engine decides to supersede the aggregate-table definition AggDef1 by creating the new definition AggDef2, which is based on AggDef1. If this setting is set to True, the build of the first instance of AggDef2 will include data from the current instance of AggDef1. If the instance requires data that is not in the current instance of AggDef1, the engine queries raw data to gather it.

Non-incremental aggregates tables can be built only from non-incremental aggregate tables, while incremental aggregate tables can be built only from incremental aggregate tables.

aggregates.create.compression.threshold

  • Default: 3.0
  • Restart: No

Specify the compression factor that aggregates proposed by the engine must meet or exceed. This factor is a measure of the quality of a proposed aggregate. It is calculated as the number of rows in the fact table divided by the estimated number of rows in a proposed aggregate.

aggregates.create.firstTime.buildFromExisting

  • Default: true
  • Restart: No

Set to True to allow only the first build of a new aggregate table to be created from data that is in an existing aggregate table. This option does not affect rebuilds, which are still created from unaggregated data.

aggregates.create.higherOrder.enabled

  • Default: true
  • Restart: No

If set to true allows the Aggregate System to build aggregates on a higher level if the compression score at the lowest level is not met.

aggregates.create.includeHigherLevels.enabled

  • Default: true
  • Restart: No

Set this to True to enable the addition of higher levels to system aggs (without causing additional joins). For example, if an aggregate has the Day level, AtScale automatically adds Month and Year.

aggregates.create.iris.nojourn.enabled

  • Default: false
  • Restart: No

Only applies to InterSystems Iris data warehouses, version 2022.2 or newer. Whether to use %NOJOURN parameter when creating aggregate tables.

aggregate.create.joins.allowPreventIncremental.enabled

  • Default: true
  • Restart: No

Whether or not to consider joining to a dataset that is not safe for incremental update if it would prevent this aggregate from otherwise being an incremental aggregate.

aggregates.create.joins.compression

  • Default: 100.0
  • Restart: No

Specify the minimum compression ratio for any proposed join. This ratio is calculated as the cardinality of the join key in the fact table (or in the dimension table if that is not available) to the cardinality of the grouped dimension values (i.e. #(Key Cardinality) / #(Dim Table grouped by Dim Value)). Joins for which the compression ratio is below this minimum will not be used.

aggregates.create.joins.enabled

  • Default: true
  • Restart: No

Set to True to allow the AtScale engine to use joins when defining aggregates. This setting must be set to True for the other AGGREGATES.CREATE.JOINS._suffix_* settings to have an effect.

aggregates.create.joins.maximumKeyCardinality

  • Default: 10000000
  • Restart: No

Specify the maximum cardinality that the AtScale engine will allow in join keys when the engine is determining whether to use a join in the definition of an aggregate. Higher cardinalities will cause the engine not to use a join.

aggregates.create.joins.maximumDepth

  • Default: 3
  • Restart: No

Specify the maximum number of dimensions that can be traversed in a join path.

aggregates.create.joins.prime.compression

  • Default: 0.99
  • Restart: No

Specify the minimum compression ratio for a proposed join from a prime query part (where aggregates cannot be stored anywhere except in preferred storage).

aggregates.create.narrowing.enabled

  • Default: true
  • Restart: No

Set to True to allow the engine to define new aggregates as narrower versions of existing aggregates when the compression factor is met or exceeded by the new aggregates. Narrower aggregates contain fewer dimensions than their predecessors.

You can set the compression factor with the setting AGGREGATES.CREATE.COMPRESSION.THRESHOLD.

aggregates.create.nonIncremental.fallback.enabled

  • Default: false
  • Restart: No

Set to True to allow engine to create an agg to be built from another agg even if it can be built incrementally from the fact table if the tables' row count ratio exceeds AGGREGATES.CREATE.NONICREMENTAL.FALLBACK.RATIO.

aggregates.create.nonIncremental.fallback.ratio

  • Default: 3.0
  • Restart: No

The ratio of fact table row count / aggregate row count that determines whether AtScale builds an aggregate from an aggregate, or incrementally from the fact table. If the ratio is greater than this value, AtScale builds the aggregate from an aggregate.

aggregates.create.partition.hintedAggregate.enabled

  • Default: true
  • Restart: No

Whether to partition hinted aggregates from Query Data Sets using the model's partition key list. For this setting to have an effect, the setting TABLES.CREATE.PARTITIONS.ENABLED must be set to True.

aggregates.create.partition.systemDefinedAggregate.enabled

  • Default: true
  • Restart: No

Set to True to enable the AtScale engine to partition system-defined aggregates. For this setting to have an effect, the setting TABLES.CREATE.PARTITIONS.ENABLED must be set to True.

aggregates.create.partition.systemDefinedAggregate.threshold

  • Default: 50000.0
  • Restart: No

Specify minimum number of rows per partition. The AtScale engine divides the estimated cardinality of a proposed system-defined aggregate table by the estimated number of partitions. If the estimated number of rows per partition does not meet or exceed this threshold, the engine will not partition the aggregate table. This value prevents the engine from creating too many partitions per aggregate table, as query processing times can increase if the number of partitions becomes too high. It also prevents the engine from creating not enough partitions per aggregate table, as a small number of very large partitions can also cause query processing times to increase. In both cases, the advantages of partitioning are negated.

aggregates.create.partition.userDefinedAggregate.enabled

  • Default: true
  • Restart: No

Set to True to enable the AtScale engine to partition user-defined aggregates. For this setting to have an effect, the setting TABLES.CREATE.PARTITIONS.ENABLED must be set to True.

aggregates.create.preferredStorage.enabled

  • Default: true
  • Restart: No

Enables the potential placement of new aggregates in preferred storage.

aggregates.create.threshold.enabled

  • Default: true
  • Restart: No

Set to True to turn on the setting AGGREGATES.CREATE.COMPRESSION.THRESHOLD.

aggregates.creation.timeout

  • Default: 4 hours
  • Restart: No

Specify the maximum length of time to allow per DDL statement that the engine uses to create an aggregate instance. Aggregates that are refreshed with full builds require one DDL statement. Aggregates that are refreshed with incremental builds require one DDL statement per partition.

aggregates.create.widening.enabled

  • Default: true
  • Restart: No

Set to True to allow the engine to define new aggregates as wider versions of existing aggregates. Wider aggregates contain more metrics than their predecessors.

aggregates.create.widening.measure.limit

  • Default: 20
  • Restart: No

Specify the maximum number of metrics that can be added when widening. This setting requires AGGREGATES.CREATE.WIDENING.ENABLED to be set to True.

aggregates.create.withoutCompressionEstimate.enabled

  • Default: false
  • Restart: No

Allow new aggregates to be created without estimated compression ratios (i.e. when statistics are not available).

aggregates.dimensional.build

  • Default: true
  • Restart: No

Set to True to allow the engine to create aggregates that contain dimensional attributes only. Such aggregates can be useful in Tableau for queries against fact tables that contain degenerate dimensions.

aggregates.dimensionalModifications.retentionLimit

  • Default: 30
  • Restart: No

The number of active instances of Dimensionally Modified Aggregates retained per model.

aggregate.incrementalUpdate.allFragmentMaterializations.duration

  • Default: 1 day
  • Restart: No

Specify the maximum length of time to allow for an incremental build of an aggregate table.

aggregate.incrementalUpdate.enabled

  • Default: true
  • Restart: No

Set to True to use incremental builds for all of the aggregates for a model when the fact dataset uses an incremental indicator. Full builds are still done for user-defined aggregates that are joins or unions of two or more tables.

aggregate.incrementalUpdates.immutable.enabled

  • Default: true
  • Restart: No

Set to True to enable incremental builds of aggregates that use joins on rarely changing dimensions.

aggregate.incrementalUpdate.maxConsecutiveStaticFragments

  • Default: 5
  • Restart: No

Specify the maximum number of fragments to allow for each incrementally built aggregate. When this threshold is exceeded, the fragments are consolidated. Lower values relative to the default result in slower consolidations and faster queries. Higher values result in faster consolidation and slower queries.

aggregates.maintenance.fusion.enabled

  • Default: false
  • Restart: No

Set to True to enable aggregate fusion

aggregate.maintenance.job.invalid-physical-plans.checkUnionedUDA.enabled

  • Default: false
  • Restart: Yes

Whether or not to compare physical plans of User Defined Aggregates with Unions.

aggregates.orc.compress

  • Default:
  • Restart: No

Specify which compression method to use. This setting is applicable only if you set the value of AGGREGATES.TABLECONFIG.PREFERREDSTORAGEFORMAT to orc.

Supported values:

  • NONE
  • ZLIB
  • SNAPPY

aggregates.parquet.compression.enabled

  • Default: false
  • Restart: No

Specify whether compression should be used. This setting is applicable only if you set the value of AGGREGATES.TABLECONFIG.PREFERREDSTORAGEFORMAT to parquet.

aggregates.parquet.compression.type

  • Default: snappy
  • Restart: No

Specify which compression method to use. This setting is applicable only if you set the value of AGGREGATES.TABLECONFIG.PREFERREDSTORAGEFORMAT to "parquet".

Supported values:

  • SNAPPY
  • GZIP
  • UNCOMPRESSED

aggregates.query.effectiveness.look.back

  • Default: 7 days
  • Restart: No

The maximum length of time that the engine will look back to search for query execution info in order to compare aggs/no aggs effectiveness

aggregates.query.effectiveness.sampling.rate

  • Default: 0.0
  • Restart: No

Query Effectiveness Sampling Rate: A value between 0 and 1. The value represents the probability threshold of performing the sampling given a uniform random distribution.

For each query that is received, AtScale computes a uniformly random value between 0 and 1. If the variable value <= Sampling Rate, then AtScale performs the Query Effectiveness computation for the inbound query.

Query Effectiveness Sampling Rate = 0 effectively disables sampling. Query Cost Savings Sampling Rate = 1 will sample every query.

aggregate.queue.priority.mode

  • Default: 1
  • Restart: Yes

Specifies the algorithm for prioritizing aggregate instance builds.

This setting accepts integer values 1-2:

  • 0: New instances are prioritized over instances that are part of a batch build.
  • 1: Batch instances are prioritized.
  • 2: Instances are built in the order in which the requests are received.

aggregate.queue.priority.userDefined.enabled

  • Default: true
  • Restart: Yes

Set to True to prioritize the building of user-defined aggregates over the building of system-defined aggregates.

aggregates.slowBuild.cutoff

  • Default: 4 seconds
  • Restart: No

The duration cutoff for a completed agg build query to emit a SlowAggEvent.

aggregate.snowflake.table.names.uppercase

  • Default: false
  • Restart: Yes

If set to true, uppercase letters will be used in aggregate tables names on Snowflake.

aggregate.speculative.allmember.enabled

  • Default: true
  • Restart: No

Set to True to enable the AtScale engine to create, for each fact table, a speculative aggregate that contains only the metrics in the corresponding fact table.

aggregate.speculative.dimensional.enabled

  • Default: true
  • Restart: No

Set to True to enable the AtScale engine to define dimension-only speculative aggregates, which are used to populate filters in BI client software.

aggregate.speculative.dimensional.minCompressionRatio

  • Default: 10
  • Restart: No

Specify the ratio as the number of rows in the full dimension dataset divided by the number of rows in the proposed aggregate for a level in the dimensional hierarchy.

For example, for a Date dimension, the lowest level in the hierarchy might be Day. The number of rows in an aggregate defined on Day would be the same number of rows in the dataset overall. There would be no aggregation. However, an aggregate defined on a higher level in the hierarchy, such as Quarter, would aggregate the data and therefore have a compression ratio. The level Year would aggregate further and have a higher compression ratio.

aggregate.speculative.enabled

  • Default: true
  • Restart: No

Set to True to activate the other aggregate.speculative.* settings.

aggregate.speculative.superAggregate.compression

  • Default: 2.0
  • Restart: No

Compression of the super aggregate against the data set that gets created. The value you specify must be a Double.

When the AtScale engine considers whether to define a super aggregate table for a fact dataset, it divides the number of rows in the dataset by the value of this setting. If the estimated number of rows in the super aggregate table is less than or equal to the resulting quotient, then the engine defines the super aggregate table.

aggregates.systemGenerated.activeInstance.extraAllowance

  • Default: 10
  • Restart: No

The maximum number of additional system-defined aggregates temporarily permitted per model when the retention limit is reached. Model-specific values can be set via model settings.

note

Setting this value too high will cause long aggregate batch build times and may impact data warehouse workloads.

aggregates.systemGenerated.activeInstance.retentionLimit

  • Default: 100
  • Restart: No

The maximum number of system-defined aggregates retained per model. Model-specific values can be set via model settings.

note

Setting this value too high will cause long aggregate batch build times and may impact data warehouse workloads.

aggregates.tableConfig.preferredStorageFormat

  • Default: none
  • Restart: No

Specify the storage format for data in aggregate tables, if you have a preference. Specify none to allow the engine to decide which format to use.

Supported values:

  • orc
  • parquet
  • rcfile
  • textfile
  • none