Test Driving GPT-5.2

Product

Use cases

About

Events

gpt-5.2

Test Driving GPT-5.2

Published: January 13, 2026

Shivani Tripathi

Share this post

OpenAI released GPT-5.2 in December 2025, promising improvements in reasoning, instruction following, and complex query handling over previous versions such as GPT-5.1 and GPT-4.1. We test drove GPT-5.2 for Tursio’s structured data search and summarize our findings in this blog.

Given the growing adoption of LLMs for SQL generation and data migration workflows, we evaluated how these claims translate into real-world performance. Specifically, we benchmarked GPT-4.1, GPT-5.1, and GPT-5.2 on their ability to predict key SQL operators using the Tursio Migration Suite.

The Tursio Migration Suite focuses on structured SQL reasoning and operator inference.
We evaluated model performance on three critical intermediate tasks in SQL generation:
Filters, Group BYs, and Order & Limit. For each task, we measured Exact Match Accuracy, i.e., whether the predicted operator(s) exactly matched the expected ground truth.

Note:

The overall accuracy is well into 90s; this is because Tursio overcomes exact match errors by deciphering similar matches using well-crafted query operator parsers.

See our previous post: Navigating LLMs: Is newer the better for data tasks?, where we compared GPT-5.1 and GPT-4.1. This blog extends the analysis to GPT-5.2.

SQL Operator Prediction Accuracy

Task	GPT-4.1	GPT-5.1	GPT-5.2
Filters	45%	46%	48%
Group BYs	60%	58%	60%
Order / Limit	70%	67%	66%

Key Observations

GPT-5.2 shows a marginal improvement in filter prediction.
Performance slightly degrades for Group BY and Order / Limit tasks.
Older models remain competitive despite architectural advances in GPT-5.2.

Detailed Analysis

Filters: Incremental Gains with Complex Conditions

GPT-5.2 outperformed GPT-4.1 in 4 out of 140 questions, especially those involving
HAVING clauses and multi-condition filters.

Question:
List default category having average housing more than 500 for housing greater than 100.

GPT-5.2 (Correct):
housing > 100 AND AVG(housing) > 500

GPT-4.1 (Partial):
housing > 10

Group BYs: Over-Generation of Attributes

GPT-5.2 frequently introduces unnecessary grouping attributes, leading to incorrect aggregation.

Question:
List top 5 payment details where payment amount has decreased in 2024.

GPT-5.2 (Incorrect):
GROUP BY TID, MID, PAYMENT_STORE_CODE, Payee_Name, City, R4G_State, QR_Sector, Category

GPT-4.1 (Correct):
GROUP BY TID, PAYMENT_DATE

Order & Limit: Unnecessary Sorting

GPT-5.2 often introduces ORDER BY clauses even when query semantics do not require ranking.

Question:
List all borrowers of different ages whose car was repossessed.

GPT-5.2 (Incorrect):
ORDER BY Age ASC

Impact on Final SQL Execution Accuracy

Since filters, group bys, and order bys are intermediate steps in SQL generation, we also analyzed how operator-level errors propagate to the final SQL query execution accuracy.

Error Type	GPT-5.2	GPT-4.1
Incorrect Filters	25%	25%
Incorrect Group BYs	37%	34%
Incorrect ORDER BYs	24%	19%

In comparison to GPT-4.1, GPT-5.2 shows a slightly higher impact of incorrectly predicted group bys and order bys on the final SQL query execution accuracy.

Even though GPT-5.2 had a marginally better performance in filter prediction, the overall impact on final SQL accuracy remains similar to GPT-4.1.

The analysis of the impact indicates that while the models may struggle with these specific operator predictions, the overall SQL generation process has some resilience to these errors, particularly in the case of filters and orders by.

Conclusion

While GPT-5.2 improves handling of complex filter logic, it does not consistently outperform
GPT-4.1 across all SQL operator prediction tasks in the Tursio Migration Suite.