Problems with sending SQL Strings to Databases (and partial solutions)

Max Heiber
5 min readAug 13, 2016

--

From the Oracle JDBC docs

Suppose you want to ask a relational database a question, then do something with the result. Most relational DBs make this complicated because the only way to talk to them from code is by sending strings of SQL. You code can’t “see into” or manipulate the SQL as code, instead you’re stuck with string concatenation, like in example above.

Problems with Sending SQL Strings Include:

  1. No syntax checking—If you accidentally write “SELECT NAME FRO” instead of “SELECT NAME FROM,” you won’t find out the mistake until runtime.
  2. No type safety—The database has a rich schema, but most ways of talking to relational databases totally ignore the schema. If the DBA changes an integer field into a string and doesn’t tell you, the program will blow up at runtime.
  3. Lack of composability: string concatenation does not compose well, but function calls and data compose much better.

In the rest of this post, I will discuss some solutions to the problems above. The conclusion is that (1) and (2) are solvable without any changes to how databases work, but that (3) won’t have a good solution without big changes to the APIs for talking to relational databases.

Solutions to the Problems with Sending SQL as a String

How to Get Syntax Checking

The most straightforward solution for syntax checking is also the least common: to use a SQL parser to parse SQL! That’s the approach taken by rust-postgres-macros:

Using rust-postgres-macros to check the syntax of a query
rust-postgres-macros catching a syntax error

rust-postgres-macros seems like it would work well for compile-time checks, but something similar could work even for a dynamically-typed language like JavaScript, as long as calls to the parser are cached.

The most common solution is to use a query builder, but I’ll address later in this article why query builders are problematic.

How to Get Type Safety

If you’re using a statically-typed language, it seems wasteful and wrong to just throw away the wealth of type information that resides in a relational database.

LINQ to SQL and Slick provide type safety by enabling you to store a representation of the database schema, then using that schema to type-check queries at compile time. Of course, this type-checking is only as good as the representation of the schema. If code and DB fall out of sync, all bets are off at compile time.

The author of the Diesel library found an innovative solution to keeping the code and the database schema in sync: Diesel connects to the database at compile time so it can do type-checking against a live schema. This sounded like crazy tight-coupling to me at first, but then I realized that code and DB are already tightly-coupled: Diesel is just a lot more honest about it, and has an advanced, DRY way of exploiting the coupling.

How to Get Composability

Composability is important for reusing code and for changing queries based on user input. For example, if an API enables a user to filter by name and specify the number of results, the programmer must find a way to make the WHERE and LIMIT clauses dynamic.

Query builders are the most promising tool we have for making SQL queries composable. I’ll go into the good and the bad.

Query Builders: the Good

Query builders are libraries that enable programmers to compose SQL by chaining method calls rather than concatenating strings. Here’s an example of adding a LIMIT clause and a WHERE clause to a query using a query builder:

a Knex.js query

Building up a query without a query builder is much more awkward, since it’s just string concatenation, with question marks used as placeholders:

Querying using JDBC (not a query builder)

Query Builders: The Bad

My main problem with query builders is that they violate Max’s Tech Advantage Principle:

A good tool makes hard problems easier and does not make easy problems harder.

For example, the following query is easy to write in SQL, but difficult to write using query builders:

Not even sure if this is valid SQL, but it’s easy to check (see my next point).

Query builders also get in the way of exploratory coding: to verify that a query produces the right sort of result, it’s helpful to just start a database connection on the command line or in a workbench and start typing. That’s easy with SQL but harder with query builders.

Finally, query builders are usually written so that they work with multiple database systems. That gets in the way of giving programmers access to the best features, like PostgreSQL’s programmer-friendly datatypes, such as arrays, JSON, and geolocation stuff. And I don’t actually think cross-database portability is very important after hearing Sandi Metz’s point that frameworks come and go much more frequently than databases do.

Conclusion

  • Sending SQL strings to databases is awkward and unsafe.
  • There are really good solutions out there for getting syntax-checking and type-checking. But they are not mainstream and there’s no single great tool.
  • There’s no tool (that I know of) that enables programmers to write composable SQL queries AND doesn’t introduce new problems. Query builders are promising, but don’t scale to large queries.
  • There’s hope for something better(see below)

Ultimately, the best solution to these problems would be for relational databases to provide better ways to interact with them than sending SQL strings.

There are several good alternatives to sending SQL strings:

  • Interacting with the database by sending queries as structured data, such as hash maps and arrays. This is unusual for relational databases but common for NoSQL databases like MongoDB, CouchDB and ElasticSearch. The only relational database I know of which lets you send queries as structured data is Datomic.
  • Interacting with the database through an API with functions and methods. RethinkDB’s ReQL is the only example of this that I know of.

Here’s hoping that mainstream relational databases will evolve beyond accepting raw SQL strings as input.

Update

On reflection, I realize I’m a bit confused about this topic.

I wrote about problems with sending a language (SQL) over the wire as a string. However, everything we send over the wire will be serialized somehow. The real problem is when quoted code is the best level of abstraction for interacting with databases.

So my reasons for trepidation about query builders only apply to query builders that are inexpressive or are leaky abstractions.

--

--

No responses yet