Use Prolog to improve LLM's reasoning

shchegrikovich.substack.com

202 points by shchegrikovich 4 days ago


z5h - 6 hours ago

i've come to appreciate, over the past 2 years of heavy Prolog use, that all coding should be (eventually) be done in Prolog.

It's one of few languages that is simultaneously a standalone logical formalism, and a standalone representation of computation. (With caveats and exceptions, I know). So a Prolog program can stand in as a document of all facts, rules and relations that a person/organization understands/declares to be true. Even if AI writes code for us, we should expect to have it presented and manipulated as a logical formalism.

Now if someone cares to argue that some other language/compiler is better at generating more performant code on certain architectures, then that person can declare their arguments in a logical formalism (Prolog) and we can use Prolog to translate between language representations, compile, optimize, etc.

gorkempacaci - 4 hours ago

The generated programs are only technically Prolog programs. They use CLPFD, which makes these constraint programs. Prolog programs are quite a bit more tricky with termination issues. I wouldn’t have nitpicked if it wasn’t in the title.

Also, the experiment method has some flaws. Problems are hand-picked out of a random subset of the full set. Why not run the full set?

fsndz - 5 hours ago

This is basically the LLM modulo approach recommended by Prof. Subbarao Kambhampati. Interesting but only works mostly for problems that have some math/first degree logic puzzle at their heart. Will fail at improving perf at ARC-AGI for example... Difficult to mimic reasoning by basic trial and error then hoping for the best: https://www.lycee.ai/blog/why-sam-altman-is-wrong

pjmlp - 7 hours ago

So we are back to Japanese Fifth Generation plan from 1980's. :)

luke_galea - 38 minutes ago

Super cool. I dig generating rules from within the LLM, but I'm not sure Prolog is the right choice in 2024.

I love Prolog and had the opportunity to use it "in anger" years ago to handle temporal logic in a scheduling app. Great experience, but I've found that more modern rules engines like Drools (anything using the Rete algorithm) are a MUCH better fit for most use cases these days.

If you are into this stuff, you might like the talk I gave on rules engines, prolog and how it led to erlang & elixir. https://www.youtube.com/watch?v=mDnntrhk-8g&t=1s

a1j9o94 - 7 hours ago

I tried an experiment with this using a Prolog interpreter with GPT-4 to try to answer complex logic questions. I found that it was really difficult because the model didn't seem to know Prolog well enough to write a description of any complexity.

It seems like you used an interpreter in the loop which is likely to help. I'd also be interested to see how o1 would do in a task like this or if it even makes sense to use something like prolog if the models can backtrack during the "thinking" phase

UniverseHacker - 5 hours ago

I think this general idea is going to be the key to really making LLMs widely useful for solving real problems.

I’ve been playing with using GPT-4 together with the Wolfram Alpha plugin, and the combo of the two can reliably solve difficult quantitative problems that neither can individually by working together, much like a human using a calculator.

DeborahWrites - 5 hours ago

You're telling me the seemingly arbitrary 6 weeks of Prolog on my comp sci course 11yrs ago is suddenly about to be relevant? I did not see this one coming . . .

nonamepcbrand1 - 6 hours ago

This is why GitHub CodeQL and Co-Pilot assistance is working better for everyone? basically codeql uses variant of Prolog (datalog) to query source code to generate better results.

baq - 7 hours ago

Patiently waiting for z3-guided generation, but this is a welcome, if obvious, development. Results are a bit surprising and sound too optimistic, though.

de6u99er - 5 hours ago

I always thought that Prolog is great for reasoning in the semantic web. It doesn't surprise me that LLM people stumble on it.

ianbicking - 4 hours ago

I made a pipeline using Z3 (another prover language) to get LLMs to solve very specific puzzle problems: https://youtu.be/UjSf0rA1blc (and a presentation: https://youtu.be/TUAmfi8Ws1g)

Some thoughts:

1. Getting an LLM to model a problem accurately is a significant prompting exercise. Bridging casual logical statements and formal logic is difficult. E.g., "or" statements in English usually mean "xor" in logic.

2. Domains usually have their own language expectations. I was doing Zebra puzzles (https://en.wikipedia.org/wiki/Zebra_Puzzle) and they have a very specific pattern and language. I don't think it's fair to really call it intuitive or even entirely unambiguous, it's something you have to learn. The LLM has to learn it too. They have seen this kind of puzzle (and I think most can reproduce the original Zebra puzzle from memory), but they lack a really firm familiarity.

3. Arguably some of the familiarity is about contextualizing the problem, which is itself a prompting task. People don't naturally solve Zebra puzzles that we find organically, it's something we encounter in specific contexts (like a puzzle book) which is not so dissimilar from prompting.

4. Incidentally Claude Sonnet 3.5 has a substantial lead. And GPT o1 is not much better than GPT 4o. In some sense I think o1 is a kind of self-prompting, an attempt to create its own context; so if you already have a well-worded prompt with instructions then o1 isn't that good at improving performance over 4o.

5. A lot of the prompting is really intended to slow down the LLM, to keep it from jumping to conclusions or solving a task too quickly (and incorrectly). Which again is a case of the prompt doing what o1 tries to do generally.

6. I'm not sure what tasks call for this kind of logical reasoning. Not that I don't think they exist, I just don't know how to recognize them. Planning tasks? Highly formalized and artificially constructed problems don't seem all that interesting... and the whole point of adding an LLM to the process is to formalize the informal.

7. Perhaps it's hard to see because real-world problems seldom have conveniently exact solutions. But that's not a blocker... Prolog (and Z3) can take constraints as a form of elimination, providing lists of possible answers, and maybe just reducing the search space is enough to move forward on some kinds of problems.

8. For instance when I give my pipeline really hard Zebra problems it usually doesn't succeed; one bug in one rule will kill the whole thing. Also I think the LLMs have a hard time keeping track of large problems; a context size problem, even though the problems don't approach their formal context limits. But I can imagine building the pipeline so it also tries to mark low-confidence rules. Given that I can imagine removing those rules, sampling the resulting (non-unique, sometimes incorrect) answers and using that to revisit and perhaps correct some of those rules.

Really I'd be most interested to hear thoughts on where this logic programming might actually be applied... artificial puzzles are an interesting exercise, but I can't really motivate myself to go too deep.

sgt101 - 7 hours ago

Building on this idea people have grounded LLM generated reasoning logic with perceptual information from other networks : https://web.stanford.edu/~joycj/projects/left_neurips_2023

arjun_khamkar - 5 hours ago

Would Creating a prolog dataset would be beneficial, so that future LLM's can be trained on it and then they would be able to output prolog code.

mise_en_place - 5 hours ago

I really enjoyed tinkering with languages like Prolog and Coq. Interactive theorem proving with LLMs would be awesome to try out, if possible.

bytebach - 4 hours ago

An application I am developing for a customer needed to read constraints around clinical trials and essentially build a query from them. Constraints involve prior treatments, biomarkers, type of disease (cancers) etc.

Using just an LLM did not produce reliable queries, despite trying many many prompts, so being an old Prolog hacker I wondered if using it might impose more 'logic' on the LLM. So we precede the textual description of the constraints with the following prompt:

-------------

Now consider the following Prolog predicates:

biomarker(Name, Status) where Status will be one of the following integers -

Wildtype = 0 Mutated = 1 Methylated = 2 Unmethylated = 3 Amplified = 4 Deleted = 5 Positive = 6 Negative = 7

tumor(Name, Status) where Status will be one of the following integers if know else left unbound -

Newly diagnosed = 1 Recurrence = 2 Metastasized = 3 Progression = 4

chemo(Name)

surgery(Name) Where Name may be an unbound variable

other_treatment(Name)

radiation(Name) Where Name may be an unbound variable

Assume you are given predicate atMost(T, N) where T is a compound term and N is an integer. It will return true if the number of 'occurences' of T is less than or equal N else it will fail.

Assume you are given a predicate atLeastOneOf(L) where L is a list of compound terms. It will succeed if at least one of the compound terms, when executed as a predicate returns true.

Assume you are given a predicate age(Min, Max) which will return true if the patient's age is in between Min and Max.

Assume you have a predicate not(T) which returns true if predicate T evaluates false and vice versa. i.e. rather than '\\+ A' use not(A).

Do not implement the above helper functions.

VERY IMPORTANT: Use 'atLeastOneOf()' whenever you would otherwise use ';' to represent 'OR'. i.e. rather than 'A ; B' use atLeastOneOf([A, B]).

EXAMPLE INPUT: Patient must have recurrent GBM, methylated MGMT and wildtype EGFR. Patient must not have mutated KRAS.

EXAMPLE OUTPUT: tumor('gbm', 2), biomarker('MGMT', 2), biomarker('EGFR', 0), not(biomarker('KRAS', 1))

------------------

The Prolog predicates, when evaluated generate the required underlying query (of course the Prolog is itself a form of query).

Anyway - the upshot was a vast improvement in the accuracy of the generated query (I've yet to see a bad one). Somewhere in its bowels, being told to generate Prolog 'focused' the LLM. Perhaps LLMs are happier with declarative languages rather than imperative ones (I know I am :) ).

anthk - 6 hours ago

Use Constraint Satisfaction Problem Solvers. It commes up with Common Lisp with ease.

YeGoblynQueenne - 5 hours ago

That's not going to work. Garbage in - Garbage out is success-set equivalent to Garbage in - Prolog out.

Garbage is garbage and failure to reason is failure to reason no matter the language. If your LLM can't translate your problem to a Prolog program that solves your problem- Prolog can't solve your problem.