Skip to content

[SQL] Support for TIMESTAMP WITH TIME ZONE#6116

Open
mihaibudiu wants to merge 2 commits into
feldera:mainfrom
mihaibudiu:issue4146
Open

[SQL] Support for TIMESTAMP WITH TIME ZONE#6116
mihaibudiu wants to merge 2 commits into
feldera:mainfrom
mihaibudiu:issue4146

Conversation

@mihaibudiu
Copy link
Copy Markdown
Contributor

@mihaibudiu mihaibudiu commented Apr 24, 2026

Fixes #4146

This PR finally adds support for the last important missing SQL type TIMESTAMP WITH TIME ZONE
There are multiple ways to implement this; I have chosen a version which provides similar capabilities with most databases: timestamps with time zones are stored at runtime as simply UTC timestamps. Thus they are comparable to each other. No time zone information is preserved at runtime, though.

I will mark this PR as a draft for two reasons:

The Rust serialization code will need to be audited for compatibility with other standard formats (e.g., Debezium).

Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several blockers; please mark this as draft as the description suggests, or address before merge.

Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread crates/sqllib/src/casts.rs
Comment thread crates/sqllib/src/casts.rs Outdated
Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread docs.feldera.com/docs/sql/datetime.md Outdated
Comment thread docs.feldera.com/docs/sql/types.md Outdated
JsonFlavor::DebeziumMySql => Self {
time_format: TimeFormat::Micros,
date_format: DateFormat::DaysSinceEpoch,
// Why is this missing fractions of second?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray // Why is this missing fractions of second? comment with a question to nobody. Either answer it (Debezium MySQL really does emit second-precision timestamps in this layout) or open an issue and reference it. Don't ship a TODO disguised as a question.

@mihaibudiu mihaibudiu force-pushed the issue4146 branch 2 times, most recently from 48d602b to 4e74e03 Compare April 25, 2026 05:46
@mihaibudiu mihaibudiu marked this pull request as draft April 28, 2026 05:28
@mihaibudiu mihaibudiu marked this pull request as draft April 28, 2026 05:28
@mihaibudiu mihaibudiu force-pushed the issue4146 branch 2 times, most recently from add429a to 7edb1df Compare April 29, 2026 04:24
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review on tip 7edb1df (force-pushed since the prior review on 8b53059). Several of my prior blockers were silently fixed without a reply (good); a few are not yet addressed; and the new force-push introduced regressions in cast_to_TimestampTz_s and a Python test file that won't even parse. PR is now draft, which is the right call.

Prior blockers — status

  • timestamp.rs SerializeWithContext using timestamp_format instead of timestamp_tz_format — fixed.
  • serialize_table_record!(TestStruct[3] {...}) field count — fixed (now [4]).
  • DBSPTypeTimestampTz getMaxValue 8 nines — fixed (6 nines).
  • CastTests Americas/New_York — fixed.
  • TypeCompiler TZ branch warning text + precision constant — fixed (note: timestampPrecisionsWarned is still shared between the two branches; minor).
  • TimestampTz // Is this right for negative timestamps? TODO + Rustdoc copy-paste — gone (the methods were removed; if/when they come back, please don't reintroduce the bug).
  • TZ section docs typos / heading — fixed.
  • Non-actionable cast error — only partially: type name corrected to TIMESTAMP WITH TIME ZONE, but still no format hint as PR_REVIEW_RULES.md asks (see inline).
  • Magic-number Objects.hash(..., 14) — number changed to 16, still a bare magic constant (see inline).

Still open / not addressed

  • Test coverage. Same two integration tests as before (issue4146, issue4146a). Your own description still says "I need to implement many more tests - at least one for each function". This is the headline blocker for the PR.
  • Plain-TIMESTAMP doc typos data time / asssumed (datetime.md L196-197) — flagged previously to fix in the same PR; still present.
  • Stray rhetorical // Why is this missing fractions of second? comment in serde_config.rs:204.

New issues introduced by this force-push

  • cast_to_TimestampTz_s now uses only DateTime::parse_from_rfc3339. That's a regression: the previous version at least accepted YYYY-MM-DD HH:MM:SS[.f] and YYYY-MM-DD. RFC3339 also doesn't accept IANA zone names, even though both the docs and DBSPTimestampTzLiteral advertise IANA support. Net effect: literals accept '2020-01-01 10:10:10 America/New_York' but CAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE) will now fail at runtime.
  • python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py — the new TZ test data is broken in three ways (syntax error + two malformed strings). See inline; this file won't even import.
  • DBSPTimestampTzLiteral.getMicros has a negative-modulo footgun (see inline) — pre-1970 TZ literals will throw from LocalDateTime.ofEpochSecond because nanoOfSecond must be non-negative.

Keeping verdict at REQUEST_CHANGES until the test coverage gap and the regressed cast/python paths are addressed.

Comment thread crates/sqllib/src/casts.rs Outdated

#[doc(hidden)]
pub fn cast_to_TimestampTz_s(value: SqlString) -> SqlResult<TimestampTz> {
if let Ok(v) = DateTime::parse_from_rfc3339(value.str()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two regressions vs. the previous implementation, both visible to users:

  1. No IANA zone names. DateTime::parse_from_rfc3339 only accepts numeric offsets. But DBSPTimestampTzLiteral and docs/sql/datetime.md both promise IANA names (America/New_York, etc.). So TIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/New_York' works as a literal but CAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE) fails. Please use chrono_tz (or equivalent) to handle IANA names, then fall back to parse_from_rfc3339 for numeric offsets.
  2. Date-only and space-separated forms dropped. The previous version of this function accepted YYYY-MM-DD HH:MM:SS[.f] (no offset, treated as UTC) and YYYY-MM-DD. RFC3339 requires the T separator and an offset, so both of those casts now fail. At minimum, please keep the %Y-%m-%d %H:%M:%S%.f and %Y-%m-%d fallbacks (UTC-assumed, matching the literal grammar's "assumed to be in UTC").

A test that exercises each accepted format end-to-end (numeric offset, IANA name, date-only, no-offset) belongs with this fix.

Comment thread crates/sqllib/src/casts.rs Outdated
Comment thread python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py Outdated
Comment thread python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py Outdated
Comment thread python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py Outdated
JsonFlavor::DebeziumMySql => Self {
time_format: TimeFormat::Micros,
date_format: DateFormat::DaysSinceEpoch,
// Why is this missing fractions of second?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still here from prior review. Either answer the question (does Debezium MySQL really emit second-precision timestamps in this layout? — if so, say so) or open an issue and reference it. Don't ship a TODO disguised as a question.

@mihaibudiu mihaibudiu marked this pull request as ready for review May 13, 2026 22:16
@mihaibudiu
Copy link
Copy Markdown
Contributor Author

Most of the Python tests are validated because they have the same expected shape and results as tests for TIMESTAMP.

@mihaibudiu mihaibudiu requested a review from snkas May 13, 2026 22:43
@mihaibudiu
Copy link
Copy Markdown
Contributor Author

@snkas this is a big PR, but I think most scrutiny is needed in the Rust code dealing with connectors and pipeline manager, which is probably < 100 lines of code. Most of the code is in tests; see the above comment why the Python tests are probably correct.

@mihaibudiu mihaibudiu force-pushed the issue4146 branch 6 times, most recently from dbdd9a6 to 8091d80 Compare May 14, 2026 22:30
Copy link
Copy Markdown
Contributor

@snkas snkas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From pipeline-manager perspective, it's a new type added to the OpenAPI schema, that part looks good.

For the connectors: the input and output format of this new data type is still MicrosSinceEpoch, so an integer. When would one use TIMESTAMP instead of TIMESTAMP WITH TIME ZONE, and vice versa? It seems they are effectively the same in terms of the queries you can do on them (e.g., it's no possible to query "give me all the tuples with the timestamp in time zone +01:00" or so)?

@mihaibudiu
Copy link
Copy Markdown
Contributor Author

Our implementation with TIMESTAMP WITH TIME ZONE aligns with most major databases. For one, some datasets already include timestamps with time zones, so this will enable us to load them. Then there are functions to convert timezones. CONVERT_TIMEZONE is an example we already have, but there is a function (not yet implemented) AT TIME ZONE, which takes a timestamp at a given time zone and converts it to UDT. Indeed, we do not carry the actual time zone; time zones are relatively complex objects, which do not make sense without a TIMESTAMP, and whose definition evolves every year, they are not simply numeric offsets.

@mihaibudiu
Copy link
Copy Markdown
Contributor Author

@snkas any actionable feedback? Can we merge this and refine?
There's documentation about the semantics of the new type in the PR.

Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big blockers from my prior two reviews are addressed: IANA zones now parse, the error message is actionable with the expected format, getMaxValue() is back to 6 fractional digits, DBSPTimestampTzLiteral.getMicros uses Math.floorDiv/Math.floorMod (no more negative-modulo footgun), the broken test_timestamp_tbl.py row is now valid Python with a clean +05:00 offset, and test coverage has expanded substantially (issue4146a with ten in-line cases, testCaseTz, testCompareTz, plus the new test_interval_*, test_cmp_operators.py, test_date_time_fn.py and TZ rows in test_timestamp_tbl.py). Approving. Two tiny leftovers inline.


@Override
public int hashCode() {
return Objects.hash(this.mayBeNull, 16);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a bare magic number — 14 -> 16 doesn't really address the original concern; the literal carries no meaning at the call site. A one-line // Discriminator for hash-disambiguation across DBSPType subclasses (or whatever the convention here is) would make this self-explanatory. Not a blocker.

JsonFlavor::DebeziumMySql => Self {
time_format: TimeFormat::Micros,
date_format: DateFormat::DaysSinceEpoch,
// Why is this missing fractions of second?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rhetorical TODO comment is still here from my first review. Either answer it (does Debezium MySQL truly emit second-precision timestamps in this flavor? — confirm and replace with a single-sentence explanation citing the upstream behaviour) or delete it. Floating questions to nobody are noise. Not a blocker.

Copy link
Copy Markdown
Contributor

@snkas snkas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so this PR has to benefits:

(1) It allows us to ingest data from sources that provide a column that has timestamp+timezone as its type. Would this not already be possible by adding an extra mapping to Timestamp? Given we don't store the timezone in any case.
(2) It opens up the possibility for SQL timezone-related functions. But if we don't store the timezone in this new type, wouldn't CHAR just still be the output type for both AT TIMESTAMP and CONVERT_TIMESTAMP? Documentation: https://docs.feldera.com/sql/datetime/#convert_timezone

As the PR description is not "Add type TIMESTAMP WITH TIMEZONE as an alias for TIMESTAMP", which would be my expectation so far based on our back-and-forth. How does the type differ?

// hardwire that for now. We can make it configurable in the future.
// FIXME: Also, the timezone should be `None`, but that gets compiled into `timezone_ntz`
// in the JSON schema, which Databricks doesn't fully support yet.
SqlType::Timestamp => DataType::Timestamp(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is this special case for Timestamp, wouldn't this also apply to TimestampTz? @ryzhyk added the comment here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that this will work - the special case was None not being supported

@mihaibudiu
Copy link
Copy Markdown
Contributor Author

In other databases an important difference between TIMESTAMP and TIMESTAMP WITH TIME ZONE is that the latter is usually adjusted for I/O based on the local timezone of the database, whereas the former is absolute. Now, Feldera databases always run in the UTC time zone, which has the nice benefit of making things deterministic. So these two do look very similar.

Think of the former as a NaiveDateTime and the latter as DateTime<Utc>. Both these types exist in Rust, and although they both have the same representation in Rust, they typecheck differently, and they do have different methods. The set of functions in Rust overlaps to a significant degree, but we will add more functions which are specific to only one of the two types.

@mihaibudiu mihaibudiu force-pushed the issue4146 branch 2 times, most recently from 18a7640 to ae3223e Compare May 19, 2026 03:58
@mihaibudiu mihaibudiu enabled auto-merge May 19, 2026 03:59
@mihaibudiu mihaibudiu disabled auto-merge May 19, 2026 03:59
@swanandx swanandx added this pull request to the merge queue May 19, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch May 19, 2026
@mihaibudiu mihaibudiu enabled auto-merge May 19, 2026 15:59
@mihaibudiu mihaibudiu added this pull request to the merge queue May 19, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 19, 2026
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@mihaibudiu mihaibudiu enabled auto-merge May 19, 2026 20:17
@mihaibudiu mihaibudiu added this pull request to the merge queue May 19, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 19, 2026
@mihaibudiu
Copy link
Copy Markdown
Contributor Author

Merging this is blocked on #6133 - without this it requires a platform upgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for 'timestamp with time zone'

3 participants