Left to Right Programming
graic.net435 points by graic 3 days ago
435 points by graic 3 days ago
SQL shows it's age by having exactly the same problem.
Queries should start by the `FROM` clause, that way which entities are involved can be quickly resolved and a smart editor can aid you in writing a sensible query faster.
The order should be FROM -> SELECT -> WHERE, since SELECT commonly gives names to columns, which WHERE will reference.
You could even avoid crap like `SELECT * FROM table`, and just write `FROM table` and have the select clause implied.
Never mind me, I'm just an old man with a grudge, I'll go back to my cave...
PSQL (and PRQL) use this ordering, and a similar pipe/arrow notation has recently been added to BigQuery.
Check out the DuckDB community extensions:
[0]: https://duckdb.org/community_extensions/extensions/psql.html
[1]: https://duckdb.org/community_extensions/extensions/prql.html
In duckdb you can also start with `FROM .. SELECT ..` without using the PSQL extension.
But I haven't found a good editor plugin that is actually able to use that information to do completions :/ If anyone knows I'd be happy to hear it
The duckdb local ui does that I believe:
duckdb -ui
More here: https://duckdb.org/2025/03/12/duckdb-ui.html
And in elixir land we've had similar pipes and composition through ecto, for the major rdbms
It's written that way because it stems from relational algebra, in which the projection is typically (always?) written first.
>The order should be FROM -> SELECT -> WHERE, since SELECT commonly gives names to columns, which WHERE will reference.
Per the SQL standard, you can't use column aliases in WHERE clauses, because the selection (again, relational algebra) occurs before the projection.
> You could even avoid crap like `SELECT * FROM table`, and just write `FROM table` and have the select clause implied.
Tbf, in MySQL 8 you can use `TABLE <table>`, which is an alias for `SELECT * FROM <table>`.
> It's written that way because it stems from relational algebra, in which the projection is typically (always?) written first.
It's inspired by a mish-mash of both relational algebra and relational calculus, but the reason why SELECT comes first is because authors wanted it to read like English (it was originally called Structured English Query Language).
You can write the relational algebra operators in any order you want to get the result you want.
> You can write the relational algebra operators in any order you want
Ultimately, yes, you can express relational algebra in any notation that gets the point across, but the parent is right that
π₁(R)
is what is commonly used. (R)π₁
not so much. Even Codd himself used the former notation style in his papers, even though he settled on putting the relation first in his query language.My impression was the parent poster was talking about order of operations like projection and selection, where you might more commonly write:
Π(σ(R)) instead of σ(Π(R))
and not about whether relational algebra uses prefix or postfix notation:
Π(σ(R)) vs. R > σ > Π
SQL's WHERE statement (and others) works totally differently from SELECT in that regard, so it doesn't make much sense to say that "SELECT comes first because relational algebra".
Interestingly, the inventor of relational algebra for database management put the "FROM" first in his query language: https://dl.acm.org/doi/pdf/10.1145/1734714.1734718
What section did you see this? I see GET (i.e. SELECT) first.
Section 3.3. "SELECT" only comes first if you omit "FROM". In SQL, if it allowed, that would be something like: `SELECT table.column`.
Tuple Variables and their Range?
I don’t think I see what you see. From 3.3:
RANGE PART P
GET W (P.P#,P.PNAME,P.QOH):(P.QOH<25)
Which approximately translates to:
SELECT p.pnum, p.pname, p.qoh FROM part p WHERE p.qoh < 25
But a more direct translation would look something like: FROM part p SELECT p.pnum, p.pname, p.qoh WHERE p.qoh < 25
Even closer would be: FROM part p; SELECT p.pnum, p.pname, p.qoh WHERE p.qoh < 25
Of course they are different languages so we can only take things so far, but the pertinent bit is that the range specifier is declared first in ALPHA, whereas it is declared later, after the projection is defined, in SQL.> It's written that way because it stems from relational algebra,
A common misconception (that SQL is a realization of RA instead of barely based on it).
In RA, is in fact `Relation > Operator`
Unless I am grossly misunderstanding your notation, I have always seen RA written with the operator first. Some examples:
https://cs186berkeley.net/notes/note6/
https://web.wlu.ca/science/physcomp/ikotsireas/CP465/W1-Intr...
> It's written that way because it stems from relational algebra
More likely because this order is closer to typical English sentence structure. SQL was designed to look like English, not relational algebra.
> Per the SQL standard, you can't use column aliases in WHERE clauses, because the selection (again, relational algebra) occurs before the projection.
Except this works in most major vendor SQL implementations. And they all support relation aliases in SELECT... Seems the standards have long fell behind actual implementations.
To clarify, I mean you can’t do this:
SELECT id AS foo
FROM MyTable
WHERE foo = 1;
Similarly, you can’t do this: SELECT id
FROM MyTable
WHERE id = MAX(id);
Because in both cases, when the WHERE predicate is being executed, the engine doesn’t yet know what you’re asking it to find - for the former, SELECT hasn’t yet been called, so the alias isn’t applied; for the latter, the aggregation happens after filtering, so it can’t know what that maximum value is.You can of course alias columns for SELECT however you’d like, and can also use those aliases in ORDER BY. You can also trivially refactor the above examples to use subqueries or CTEs to accomplish the goal.
You absolutely can in many engines, for instance in Snowflake... with the small exception that in all supporting engines you actually need to use HAVING instead of WHERE in your second example (because it compares an aggregation, otherwise WHERE is fine).
You can also use "correlated column aliases" (I can't recall the proper name) i.e.
SELECT
id AS foo,
foo || '_1' as foo_n,
right(foo_n, 1) as foo_ordinal
FROM MyTable
WHERE foo = 1;
Again, if this isn't all part of SQL standards, the reality is that a lot of engines have semi-standard (sometimes very proprietary too) ways of handling these now common patterns. For real-world use cases, the standards are unfortunately becoming increasingly irrelevant. I think it would be better in the long term to use standards, but if they can't keep up with actual usage then they will just get ignored.You apparently can in SQLIte though: https://sqlfiddle.com/sqlite/online-compiler and query:
SELECT LOWER(Name) as LName FROM Product where LName like '%frame%';
>Per the SQL standard, you can't use column aliases in WHERE clauses, because the selection (again, relational algebra) occurs before the projection.
Don't blame the math for sloppy implementation. Nothing in the math suggest where aliases should be specified. Don't get me wrong it is a coinvent, logical place to put them, but as you say it rather limits their use, and could have been done better.
> It's written that way because it stems from relational algebra, in which the projection is typically (always?) written first.
Okay, so what?
We're not obligated to emulate the notational norms of our source material, and it is often bad to do so when context changes.
I was explaining why it is the way that it is. If you'd like your own version of a parser, here's Postgres' [0]. Personally, I really like SQL's syntax and find that it makes sense when reading it.
[0]: https://github.com/postgres/postgres/tree/master/src/backend...
There was not argument about how much sense it makes. There was an argument for improving readability by placing the table names first.
Lots of people “like” things because they are familiar with them. And that’s a fine enough reason. But if you step out of your zone of familiarity, can you find improvements? Are you willing to forgo any prejudice you may possess to evaluate other suggestions?
Just a little willingness to see another perspective is all anyone asks.
> If you'd like your own version of a parser, here's Postgres' [0].
Funnily enough, if you pull up a version of Postgres' parser prior to the 1995 release, you'll find that it puts the relation first.
> I was explaining why it is the way that it is.
No one was confused what you were doing.
I'm saying the historical reason for why it is the way it is, doesn't matter. I would hope we design our languages to be maximally clear and useful, not to be maximally full of historical cruft.
There's no accounting for taste, so you're welcome to like whatever you like, but I don't think you liking a backward syntax is particularly persuasive. It sounds more like you're just used to it than that you see any actual benefits to it.
Yes, C#'s DSL that compiles to SQL (LINQ-to-SQL) does the same thing, `from` before the other clauses, for the same reason that it allows the IDE code completion to offer fields while typing the other clauses.
Kusto, the Azure query language for data analysis uses that form with piping as well.
https://learn.microsoft.com/en-us/kusto/query/?view=microsof...
Also the LINQ approach in .NET.
I do agree, that is about time that SQL could have a variant starting with FROM, and it shouldn't be that hard to support that, it feels like unwillingness to improve the experience.
Kusto is so much vetter than it has any right to be! Normally I'd run a mile at a cloud provider specific programming language that can't ve used elsewhere, but it really is quite nice! (there are some wierd quirks, but a tonne less than I'd have thought)
This was a historical decision because SQL is a declarative language. I was confused for too long, I want to admit, about the SQL order: FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
As a self-taught developer, I didn't know what I was missing, but now the mechanics seem clear, and if somebody really needs to handle SELECT with given names, then he should probably use CTE:
WITH src AS (SELECT * FROM sales), proj AS (SELECT customer_id, total_price AS total FROM src), filt AS (SELECT * FROM proj WHERE total > 100) SELECT * FROM filt;
> This was a historical decision because SQL is a declarative language
It would be equally declarative if FROM came first.
> This was a historical decision because SQL is a declarative language.
What do you mean? Both ALPHA (the godfather declarative database language created by Codd himself) and QUEL, both of which inspired SQL, put "FROM" first. Fun fact: SQL was originally known as SEQUEL, which was intended to be a wordplay on being a followup to QUEL.
Another commenter makes a good case that SQL ended up that way because it was trying to capture relational algebra notation (i.e. π₁(R), where π~select, ₁~column, R~table). Yet relational algebra is procedural, not declarative. Relational calculus is the declarative branch.
Although the most likely explanation remains simply that SEQUEL, in addition to being fun wordplay, also stood for Structured English Query Language. In English, we're more likely to say "select the bottle from the fridge", rather than "from the fridge, select the bottle". Neither form precludes declarative use.
It's true. I call authoring SQL "holographic programming" because any change I want to make frequently implies a change to the top and bottom of the query too; there's almost never a localized change.
C++ has this issue too due to the split between header declarations and implementations. Change a function name? You're updating it in the implementation file, and the header file, and then you can start wondering if there are callers that need to be updated also. Then you add in templates and the situation becomes even more fun (does this code live in a .cc file? An .h file? Oh, is your firm one of the ones that does .hh files and/or .hpp files also? Have fun with that).
> The order should be FROM -> SELECT -> WHERE, since SELECT commonly gives names to columns, which WHERE will reference.
Internally, most SQL engines actually process the clauses in the order FROM -> WHERE -> SELECT. This is why column aliases (defined in SELECT) work in the GROUP BY, HAVING and ORDER BY clauses, but not in the WHERE clause.The order should be starting on FROM, followed by any sequences of whatever clauses (except for FROM), always creating an intermediary result-set.
FROM table -- equivalent to today's select * from table
SELECT a, 1 as b, c, d -- equivalent to select ... from table
WHERE a in (1, 2, 3) -- the above with the where
GROUP BY c -- the above with the group by
WHERE sum(d) > 100 -- the above with having sum(d) > 100
SELECT count(a distinct) qt_a, sum(b) as count, sum(d) total_d -- the above being a sub-query this selects from
How would this approach support SQL constructs such as the HAVING[0] clause? Or is that what you meant by:
WHERE sum(d) > 100 -- the above with having sum(d) > 100
0 - https://www.w3schools.com/sql/sql_having.aspMy other big dream would be allowing multiple WHERE clauses that would be semantically ANDed together because that's what would happen if you filtered a result set twice.
Yes, by whatever clauses I also meant repeating ones.
SQL has a conceptual issue with repeating group by clauses, so maybe not that one (or maybe we should fix the conceptual issues). But any other, including the limiting and offset ones.
> always creating an intermediary result-set
You want the DB to first run SELECT * FROM <table>, and then start operating on that?
Obviously not, and bringing this up as if it's a gotcha just shows you aren't keeping up with the conversation. Try less to correct people and more to understand people.
It's about how humans think about it, not about how the computer executes it.
How else do you read what they wrote?
I read it as describing their preferred mental model for declaring a result set, which is different from describing their preferred behavior to produce it. This seems clear to me in wording and context; it’s also broadly consistent with how SQL is understood, and how it is generally implemented.
Presumably, the intermediate result sets would necessary be materialized, only logical.
DuckDB accepts just that: https://duckdb.org/docs/stable/sql/query_syntax/from.html#fr...
Off topic, but I really wish DuckDB's FTS extension could add to the index as the table is added to. It's the only thing keeping me on sqlite for a project.
I've always liked the `select...from` order because it helps me understand the goal before reading the logic. In other words, I want to end up with this, and here's how I want to go about getting it.
While I agree, some SQL editors do provide code completion for the SELECT clause if you type the FROM clause first.
That is my trick as well, but feels backwards, by now there could be a variant of SQL supporting FROM first.
You can always start with the FROM clause and then add the SELECT clause above it if it's a out auto-,completion.
I usually start with: ``` select * from <table> as <alias> limit 5 ```
It doesn't seem like it would be all that difficult to allow that form in a backwards compatible way. Why hasn't this happened?
Because SQL is a whole language family of separate languages grudgingly adopting new features introduced by the notoriously slow standartization committee.
BigQuery SQL and Spark SQL (and probably some others) have adopted pipelined syntax, DuckDB SQL simply allows you to write the query FROM-first.
Not sure. It seems a fairly basic syntax change and it could even be easily made backwards compatible.
Now we need to get the ANSI SQL committee to standardize it ANSI SQL 2027 or some such.
My main caveat here, is that often the person starting a select knows what they want to select before they know where to select it from. To that end, having autocomplete for the sources of columns is far far more useful than autocomplete for columns from a source.
I will also hazard a guess that the total number of columns most people would need autocomplete for are rather limited? Such that you can almost certainly just tab complete for all columns, if that is what you really want/need. The few of us that are working with large databases probably have a set of views that should encompass most of what we would reasonably be able to get from the database in a query.
Seems nonsensical. Column names have no meaning without the table. Table is the object, columns are the properties. What language goes Property.Object? All popular languages have this wrong?
Anytime you store a single property for all objects in an array or hash table, so, Fortran, BASIC, APL, Numpy, R, awk, and sometimes Perl. Parallel arrays are out of style but certainly not unheard of. They're the hot new thing in games programming, under the name "entity-component-system".
Even C when `property` is computed rather than materialized. And CLOS even uses that syntax for method invocation.
The only language I can think of that uses field(record) syntax for record field access is AT&T-syntax assembly.
When languages use order property, object - it's usually because they treat the properties as functions.
name(object) makes as much sense as object.name
I'm assuming you are largely just not thinking this one through? We are not modeling the domain, we are describing some data we want to select. Without knowing how it is modeled, I can give a brief "top line" for expected select statements on many data models. "I want average_weather, year, zip_code", "I want year, college_name, degree_name, median_graduate_salary, mean_graduate_salary", "..."
I don't think it is tough to describe many many reports of data that you would want in this way. Is it enough for you to flat out get the answer? No, of course not. But nor is it enough to just start all queries at what possible tables you could use as a starting point.
How would you even know it's possible get the data before you've chosen a table?
Your example is also not a complete select statement, you would need to go back and add the actual aggregate functions, and oops the zip_code column was actually called zip, so we need to remap that as well. You can almost never finish a select statement before you have inspected the tables, so why not just start there immediately?
Often times you don't know it is possible to get some data without consulting the table? Worse, you probably need to consult the views that have been made by any supporting DB team before you go off and build your own rollups. Unless you like not taking advantage of any aggregate jobs that are running nightly for reporting.
And to be clear, I'm all for complaining about the order of a select statement, to a very large degree. It is done in such a way that you have to jump back and forth as you are constructing the full query. That can feel a bit painful.
However, I don't know that it is just the SELECT before FROM that causes this jumping back and forth and fully expect that you would jump around a fair bit even with the FROM first. More, if I am ever reworking a query that the system is running, I treat the SELECT as the contract to the application, not the FROM.
There is a bit of "you should know the database before you can expect to send off a good query", but that really cuts to any side of this debate? How do you know the tables you want so well, but you don't know the columns you are going to ask for?
> How do you know the tables you want so well, but you don't know the columns you are going to ask for?
Simply because there are a lot more columns than there are tables.
Of course, I sometimes forget the exact table name as well. However, this is mostly not an issue as the IDE knows all table names before anything has been entered. By simply entering `FROM user`, the autocomplete will list all tables that contain `user`, and I can complete it from there. I cannot do the same with the column selection unless I first write the FROM part. And even if I do know exactly which columns and tables I want I would still want autocomplete to work when creating the SELECT statement. Rather than typing `zip_code`, I can most likely just type `z<TAB>`.
That is why 99% of my select queries start as `SELECT * FROM mytable`, and then I go back to fill in the select statement. And it's not just me, all colleagues I've worked with does exact same thing.
For larger, more complicated queries, you'll have to go back and forth a lot as you join tables together, that is unavoidable, but 80-90% of my queries could be finished in one go if the FROM part came first.
But "more columns than there are tables" is also why I put the thing about, "the total number of columns most people would need autocomplete for are rather limited?" I could see an argument on how this mattered back at the origin of SQL, but today? It is entirely conceivable that you can autocomplete the column names with it updating the from as necessary for the columns you have selected. You could then tab to the from and pick alternative tables with natural joins prefilled between the tables. With very minimum effort.
I actually seem to recall this was done in some of the early dedicated SQL tools. It is rather amusing how much we handicap ourselves by not using some things older tools built.
How did you know it was zip_code and not ZipCode?
Maybe in this case you know the same naming conventions are enforced across all tables. But in general it’s difficult to know the exact column name without looking it up first.
> How did you know it was zip_code and not ZipCode?
I think you're missing the point; you start off with your goal, regardless of the column name. No one has the goal "we want columsn from this specific table", it's always "we want these columns in the final output".
Right, but which columns? The goal absolutely IS to select columns from a table. ie. my goal is not to select created_at, my goal is to select Foo's created_at.
> The goal absolutely IS to select columns from a table.
No. The goal is to have an output column $FOO, where $FOO is meaningful and might not even be in the database in the first place.
> my goal is not to select created_at, my goal is to select Foo's created_at.
Then your goal is to get a value out of a specific table, namely `Foo`, presumably because the end-user wants to see some value in the results.
The end-user getting the value has neither an interest in nor knowledge of your schema. `Foo.created_at` is no more meaningful in the result sets than an unadorned `created_at`.
For this specific example, the end-user might want a column in the output called `age` (if it came from an inventory table`, or `duration` (if it came from a metrics tables), or perhaps `expiry` (if it is a table containing perishable stock). These are all a column in a Foo table, but the request that caused you to write the query in the first place does not mention table at all.
You get their requirement as either `age`, `duration`, `expiry` or similar. That is what you are starting from: "a need for a specific piece of data". You are not starting from "a need from a specific table", because the information needed is coming from someone who neither knows nor cares what your schema looks like.
I can’t think of an example where I knew the columns I wanted to select before I knew which table I wanted to select them from.
Did you ever not do things like
select name, min(price) from product join product_price on product.id = product_price.product_id
where you start with "I need a product name and minimum price" before thinking about where they come from?The more one uses SQL to filter for, the more you'll think about what you want to achieve vs how (which comes after): in a sense, the SELECT is your function return type declaration, and I frequently start my functions with the declaration.
> where you start with "I need a product name and minimum price" before thinking about where they come from?
Sure, and pretty much every time the names I wrote up were not the ones in the table so that was a complete waste of time.
> the SELECT is your function return type declaration
That might be true if “select” only contained aliases, bur that’s not the case at all, so what it is is complete nonsense.
This presumes that you never pick the wrong table, as well? :D
My argument would be largely that you are doing a search of all of the tables for the columns that you want. With having to know the way to join the necessary tables along the way.
And I want to be clear, I don't think this is necessarily "the way." I more think that it is almost certainly an iterative process that can be started at either place, and will require bouncing between the two quite often.
For instance, yes, you can autocomplete column names from tables that you have put in the from. What if you can't find the column you are anticipating there? Go back and rescan the table names hoping you can guess which table should have the column you want? Then go back to autocomplete for columns to see if expected value appears? Or go to a bit more of a global search for columns?
Either way should be fine.
Can you help me understand this situation? Almost always what I want to select is downstream of what I am selecting from, I can't write anything in SELECT until i understand the structure/columns available in FROM.
"I want 'first_name, last_name, and average(grade)' for my report." That is easy enough to state without knowing anything about how the data is normalized.
Back when I worked to support a data science team, I actually remember taking some of their queries and stripping everything but the select so that I could see what they were trying to do and I could add in the correct parts of the rest.
Even assuming that this is plausible way a user could think about it (that can be understood) it shows why is bad
Think: I have 20 tables with the column `id`
"I want 'id,id,id'"
is bad UX, and is what here is being argued, then when the syntax guide you: "I want 'FROM a: id'" is better
I don't think anyone would ever say they want "id, id, id"? They would say they want "customer_id, order_id, item_id" or some such. You would then probably say, "we just use 'id' for the id of every item..." but many places that I've worked actually explicitly didn't do that for essentially this reason. Natural joins on "foo_id" "just work" if you don't give every table the same "id" column.
That is what the syntax allow. And is the user that need to "patch" the meaning using other things like alias, renames or hopefully, proper names for fields (by the way "id, id, id" happens a lot in the wild!)
Is like "to control mutation I only need to append "mut" to the name!
Current SELECT syntax does allow one to "SELECT user.id as user_id, product.id as product_id..." which can then even autocomplete the FROM for you from your query "declaration" (a-la function declaration, in particular its return type).
This assumes I have all the column memorized but not all the tables?
Even in your example, first and last could refer to student or teacher. But presumably you know you're looking for student data before knowing the exact columns.
No, this assumes you know what you want for a query. Which, seems largely fair?
Like, how would you send this question to someone? Or how would you expect it to be sent to you? If your boss doesn't tell you from what table they want some data, do you just not answer?
And sure, there could be ambiguities here. But these are not really fixed by listing the sources first? You would almost certainly need to augment the select to give the columns names that disambiguate for their uses. For example, when you want the teacher and the student names, both.
And this is ignoring more complications you get from normalized data that is just flat out hard to deal with. I want a teacher/student combination, but only for this specific class and term, as an example. Whether you start the from at the student, the class, or the term rosters feels somewhat immaterial to how you want to pull all of that data together for why ever you are querying it.
If my boss asks me for a zip code, I'm going to ask "for what?"
If they ask for "address for a customer" I can go to the customer table and look up what FKs are relevant and collect all possible data and then narrow down from there.
I'd assume they would ask for "aggregate sales by month to zip codes," or some such. Which, you'd probably get from a reporting table, and not bother trying to do the manual aggregate over transactional data. (That is, OLAP versus OLTP is a very real divide that you can't wave away.)
Realistically, I strongly suspect you could take this argument either direction. If you have someone making a query where they are having to ask "what all tables could I start from?" you are in for some pain. Often the same data is reachable from many tables, and you almost certainly have reasons for taking certain paths to get to it. Similarly, if they should want "person_name", heaven help them.
Such that, can you contrive scenarios where it makes sense to start the query from the from clause? Sure. My point is more that you almost certainly have the entire query conceptualized in your mind as you start. You might not remember all of the details on how some things are named, but the overall picture is there. Question then comes down to if one way is more efficient than the other? I have some caveats that this is really a thing hindered by the order of the query. We don't have data, of course, and are arguing based on some ideas that we have brought with us.
So, would I be upset if the order was reversed? Not at all. I just don't expect that would actually help much. My memory is using query builders in the past where I would search for "all tables that have customer_id and order_id in them" and then, "which table has customer_id and customer_address" and then... It was rarely (ever?) the name of the table that helped me know which one to use. Rather, I needed the ones with the columns I was interested in.
Your approach only works with massive assumptions about the structure of the data or a very simplistic data structure.
SELECT statments don't just use table names, they can use aliases for those table names, views, subqueries, etc.
The FROM / JOIN blocks are where the structure of the data your are selecting from is defined. You should not assume you understand what a SELECT statement means until you have read those blocks.