How University Students Use Claude
anthropic.com399 points by pseudolus 9 days ago
399 points by pseudolus 9 days ago
> A common question is: “how much are students using AI to cheat?” That’s hard to answer, especially as we don’t know the specific educational context where each of Claude’s responses is being used.
I built a popular product that helps teachers with this problem.
Yes, it's "hard to answer", but let's be honest... it's a very very widespread problem. I've talked to hundreds of teachers about this and it's a ubiquitous issue. For many students, it's literally "let me paste the assignment into ChatGPT and see what it spits out, change a few words and submit that".
I think the issue is that it's so tempting to lean on AI. I remember long nights struggling to implement complex data structures in CS classes. I'd work on something for an hour before I'd have an epiphany and figure out what was wrong. But that struggling was ultimately necessary to really learn the concepts. With AI, I can simply copy/paste my code and say "hey, what's wrong with this code?" and it'll often spot it (nevermind the fact that I can just ask ChatGPT "create a b-tree in C" and it'll do it). That's amazing in a sense, but also hurts the learning process.
> it's literally "let me paste the assignment into ChatGPT and see what it spits out, change a few words and submit that".
My wife is an accounting professor. For many years her battle was with students using Chegg and the like. They would submit roughly correct answers but because she would rotate the underlying numbers they would always be wrong in a provably cheating way. This made up 5-8% of her students.
Now she receives a parade of absolutely insane answers to questions from a much larger proportion of her students (she is working on some research around this but it's definitely more than 30%). When she asks students to recreate how they got to these pretty wild answers they never have any ability to articulate what happened. They are simply throwing her questions at LLMs and submitting the output. It's not great.
ChatGPT is laughably terrible at double entry accounting. A few weeks ago I was trying to use it to figure out a reasonable way to structure accounts for a project given the different business requirements I had. It kept disappearing money when giving examples. Pointing it out didn’t help either, it just apologized and went on to make the same mistake in a different way.
Using a system based on randomness for a process that must occur deterministically is probably the wrong solution.
I'm running into similar issues trying to use LLMs for logic and reasoning.
They can do it (surprisingly well, once you disable the friendliness that prevents it), but you get a different random subset of correct answers every time.
I don't know if setting temperature to 0 would help. You'd get the same output every time, but it would be the same incomplete / wrong output.
Probably a better solution is a multi phase thing, where you generate a bunch of outputs and then collect and filter them.
> They can do it (surprisingly well, once you disable the friendliness that prevents it) ...
Interesting! :D Do you mind sharing the prompt(s) that you use to do that?
Thanks!!
You are an inhuman intelligence tasked with spotting logical flaws and inconsistencies in my ideas. Never agree with me unless my reasoning is watertight. Never use friendly or encouraging language. If I’m being vague, demand clarification. Your goal is not to help me feel good — it’s to help me think better.
Keep your responses short and to the point. Use the Socratic method when appropriate.
When enumerating assumptions, put them in a numbered list. Make the list items very short: full sentences not needed there.
---
I was trying to clone Gemini's "thinking", which I often found more useful than its actual output! I failed, but the result is interesting, and somewhat useful.
GPT 4o came up with the prompt. I was surprised by "never use friendly language", until I realized that avoiding hurting the user's feelings would prevent the model from telling the truth. So it seems to be necessary...
It's quite unpleasant to interact with, though. Gemini solves this problem by doing the "thinking" in a hidden box, and then presenting it to the user in soft language.
Have you tried Deepseek-R1?
I run it locally and read the raw thought process, find it very useful (can be ruthless at times) seeing this before it tags on the friendliness.
Then you can see it's planning process to tag on the warmth/friendliness "but the user seems proud of... so I need to acknowledge..."
I don't think Gemini's "thoughts" are the raw CoT process, they're summarized / cleaned up by a small model before returned to you (same as OpenAI models).
That's fascinating. I've been trying to get other models to mimick Gemini 2.5 Pro's thought process, but even with examples, they don't do it very well. Which surprised me, because I think even the original (no RLHF) GPT-3 was pretty good at following formats like that! But maybe there's not enough training data in that format for it to "click".
It does seem similar in structure to Gemini 2.0's output format with the nested bullets though, so I have to assume they trained on synthetic examples.
>Pointing it out didn’t help either, it just apologized and went on to make the same mistake in a different way.
They really should modify it to take out that whole loop where it apologizes, claims to recognize its mistake, and then continues to make the mistake that it claimed to recognize.
> me, just submitted my taxes for last year with a lot of help from ChatGPT: :eyes:
I guess this students don't pass, do they? I don't think that's a particularly hard concern. It will take a bit more, but will learn the lesson (or drop out).
I'm more worried about those who will learn to solve the problems with the help of an LLM, but can't do anything without one. Those will go under the radar, unnoticed, and the problem is, how bad is it, actually? I would say that a lot, but then I realize I'm pretty useless driver without a GPS (once I get out of my hometown). That's the hard question, IMO.
As someone already said, parents used to be concerned that kids wouldn't be able to solve maths problems without a calculator, and it's the same problem, but there's a difference between solving problems _with_ LLMs, and having LLMs solve it _for you_.
I don't see the former as that much of a problem.
Well the extent is much broader from a calculator vs an LLM. Why should I hire you if an agent can do it ? LLM is every job is a calculator and can be replaced. Spotify CEO stated on X that before asking for more headcount they have to justify not being able to do the job with an agent. So all the students who let the LLM do their assignment and learn basically nothing, what’s their value for a company to be hired ? The company will and is just using the agent as well …
> Why should I hire you if an agent can do it ?
An agent can't do it. It can help you like a calculator can help you, but it can't do it alone. So that means you've become the programmer. If you want to be the programmer, you always could have been. If that is what you want to be, why would you consider hiring anyone else to do it in the first place?
> Spotify CEO stated on X that before asking for more headcount they have to justify not being able to do the job with an agent.
It was Shopifiy, but that's just a roundabout way to say that there is a hiring freeze due to low sales (no doubt because of tariff nonsense seizing up the market). An agent, like a calculator, can only increase the productivity of a programmer. As always, you still need more programmers to perform more work than a single programmer can handle. So all they are saying is that "we can't afford to do more".
> The company will and is just using the agent as well …
In which case wouldn't they want to hire those who are experts in using agents? If they, like Shopify, have become too poor to hire people – well, you're screwed either way, aren't you? So that is moot.
So like arguably when people were not using calculators they made calculations by hand and there was a room full of people that did calculations. That’s gone now thanks to calculators. But it the analogy goes to an order of magnitude higher, now fewer people can « do » the job of many so less hiring maybe but not just on « do calculations by hand » but almost all fields where the use of software is required.
Where will all those new students find a job if :
- they did not learn much because LLM did work for them
- there is no new jobs required because we are more productive ?
> now fewer people can « do » the job of many
Never in the history of humans have we been content with stagnation. The people who used to do manual calculations soon joined the ranks of people using calculators and we lapped up everything they could create.
This time around is no exception. We still have an infinite number of goals we can envision a desire for. If you could afford an infinite number of people you would still hire them. But Shopify especially is not in the greatest place right now. They've just come off the COVID wind-down and now tariffs are beating down their market further. They have to be very careful with their resources for the time being.
> - they did not learn much because LLM did work for them
If companies are using LLMs as suggested earlier, they will find jobs operating LLMs. They're well poised for it, being the utmost experts in using them.
> - there is no new jobs required because we are more productive ?
More productivity means more jobs are required. But we are entering an age where productivity is bound to be on the decline. A recession was likely inevitable anyway and the political sphere is making it all but a certainty. That is going to make finding a job hard. But for what scant few jobs remain, won't they be using LLMs?
> Spotify CEO stated on X that before asking for more headcount they have to justify not being able to do the job with an agent.
Spotify CEO is channeling The Two Bobs from Office Space: "What are you actually doing here?" Just in a nastier way, with a kind of prisoner's dilemma on top. If you can get by with an agent, fine, you won't bother him. If you can't, why can't you? Should we replace you with someone who can, or thinks they can?
Spotify CEO is not his employees' friend.
> Why should I hire you if an agent can do it ?
You as the employer are liable, a human has real reasoning abilities and real fears about messing up, the likely hood of them doing something absurd like telling a customer that a product is 70% off and them not losing their job is effectively nil. What are you going to do with the LLM, fire it?
Data scientist and people deeply familiar with LLMs to the point that they could fine tune a model to your use case cost significantly more than a low skilled employee and depending on liability just running the LLM may be cheaper.
As an accounting firm ( one example from above ) far as I know in most jurisdictions the accountant doing the work is personally liable, who would be liable in the case of the LLM?
There is absolutely a market for LLM augmented workforces, I don't see any viable future even with SOTA models right now for flat out replacing a workforce with them.
I fully agree with you about liability. I was advocating for the other point of view.
Some people argue that it doesn’t matter if there is mistakes (it depends which actually) and with time it will cost nothing.
I argue that if we give up learning and let LLM do the assignments then what is the extent of my knowledge and value to be hired in the first place ?
We hired a developper and he did everything with chatGPT, all the code and documentation he wrote. First it was all bad because from the infinity of answers chatGPT is not pinpointing the best in every case. But does he have enough knowledge to understand what he did was bad ? And then we need people with experience that confronted themselves with hard problems and found their way out. How can we confront and critic an LLM answer otherwise ?
I feel student’s value is diluted to be at the mercy of companies providing the LLM and we might loose some critical knowledge / critical thinking in the process from the students.
I agree entirely on your take regarding education. I feel like there is a place where LLMs are useful but doesn't impact learning but it's definitely not in the "discovery" phase of learning.
However I really don't need to implement some weird algorithms myself every time (ideally I am using a well tested Library) but the point is that you learn to be able to but also to be able to modify or compose the algorithm in ways the LLM couldn't easily do.
Why did you hire someone who produced bad code and docs? Did he manage to pass interview without an AI?
>As someone already said, parents used to be concerned that kids wouldn't be able to solve maths problems without a calculator
Were they wrong? People who rely too much on a calculator don't develop strong math muscles that can be used in more advanced math. Identifying patterns in numbers and seeing when certain tricks can be used to solve a problem (verses when they just make a problem worse) is a skill that ends up being beyond their ability to develop.
Yes, they were wrong. Many young kids who are bad at mental calculations are later competent at higher mathematics and able to use it. I don't understand what patterns and tricks you're referring to, but if they are important for problems outside of mental calculations, then you can also learn about them by solving these problems directly.
>Were they wrong? People who rely too much on a calculator don't develop strong math muscles that can be used in more advanced math.
Yes. People who rely too much on a calculator weren't going to be doing advanced math anyway.
Almost none of the cheaters appear to be solving problems with LLMs. All my faculty friends are getting large portions of their class clearly turning in "just copied directly from ChatGPT" responses.
It's an issue in grad school as well. You'll have an online discussion where someone submits 4 paragraphs of not-quite-eloquent prose with that AI "stink" on it. You can't be sure but it definitely makes your spidey sense tingle a bit.
Then they're on a video call and their vocabulary is wildly different, or they're very clearly a recent immigrant and struggle with basic sentence structure such that there is absolutely zero change their discussion forum persona is actually who they are.
This has happened at least once in every class, and invariably the best classes in terms of discussion and learning from other students are the ones where the people using AI to generate their answers are failed or drop the course.
> there's a difference between solving problems _with_ LLMs, and having LLMs solve it _for you_.
If there is a difference, then fundamentally LLMs cannot solve problems for you. They can only apply transformations using already known operators. No different than a calculator, except with exponentially more built-in functions.
But I'm not sure that there is a difference. A problem is only a problem if you recognize it, and once you recognize a problem then anything else that is involved along the way towards finding a solution is merely helping you solve it. If a "problem" is solved for you, it was never a problem. So, for each statement to have any practical meaning, they must be interpreted with equivalency.
There is a difference between thinking about the context of a problem and "critical thinking" about the problem or its possible solutions.
There is a measurable decrease in critical thinking skills when people consistently offload the thinking about a problem to an LLM. This is where the primary difference is between solving problems with an LLM vs having it solved for you with an LLM. And, that is cause for concern.
Two studies on impact of LLMs and generative AI on critical thinking:
https://www.mdpi.com/2075-4698/15/1/6
https://slejournal.springeropen.com/articles/10.1186/s40561-...
How many people are "good drivers" outside their home town? I am not that old, but old enough to remember all adults taking wrong turns trying to find new destinations for the first time.
>How many people are "good drivers" outside their home town?
My wife is surprisingly good at remembering routes, she'll use the GPS the first time, but generally remembers the route after that. She still isn't good at knowing which direction is east vs west or north/south, but neither am I.
I'm like that too, but I don't think it transfers particularly well to LLMs. The problem is that you can just skip straight to the answer and ignore the explanation (if it even produces one).
It would be pretty neat if there was an LLM that guides you towards the right answer without giving it to you. Asking questions and possibly giving small hints along the way.
>It would be pretty neat if there was an LLM that guides you towards the right answer without giving it to you. Asking questions and possibly giving small hints along the way.
I think you can prompt them to do that, but that doesn't solve the issue of people not being willing to learn vs just jump to the answer, unless they made a school approved one that forced it to do that.
For your GPS at worst you follow directions road sign by road sign. For a job without the core knowledge what’s the goal of hiring one person vs an unqualified one doing just prompts or worse, hiring no one and let agents do the prompting ?
All tech becomes a crutch. People can't wash their clothes without a machine. People can't cook without a microwave. Tech is both a gift and a curse.
Back in my day they worried about kids not being able to solve problems without a calculator, because you won't always have a calculator in your pocket.
...But then.
Not being able to solve basic math problems in your mind (without a calculator) is still a problem. "Because you won't always have a calculator with you" just was the wrong argument.
You'll acquire advanced knowledge and skills much, much faster (and sometimes only) if you have the base knowledge and skills readily available in your mind. If you're learning about linear algebra but you have to type in every simple multiplication of numbers into a calculator...
> if you have the base knowledge and skills readily available in your mind.
I have the base knowledge and skill readily available to perform basic arithmetic, but I still can't do it in my mind in any practical way because I, for lack of a better description, run out of memory.
I expect most everyone eventually "runs out of memory" if the values are sufficiently large, but I hit the wall when the values are exceptionally small. And not for lack of trying – the "you won't always have a calculator" message was heard.
It wasn't skill and knowledge that was the concern, though. It was very much about execution. We were tested on execution.
> If you're learning about linear algebra but you have to type in every simple multiplication of numbers into a calculator...
I can't imagine anyone is still using a four function calculator. Certainly not in an application like learning linear algebra. Modern calculators are decidedly designed for linear algebra. They need to be given the rise of things like machine learning that are heavily dependent on such.
This is now reality -- fighting to change the students is a losing battle. Besides in terms of normalizing grade distributions this is not that complicated to solve.
Target the cheaters with pop quizzes. Prof can randomly choose 3 questions from assignments. If students cant get enough marks on 2/3 of them they are dealt a huge penalty. Students that actually work through the problems will have no problems with scoring enough marks on 2/3 of the questions. Students that lean irresponsibly on LLMs will lose their marks.
Why not just grade solely based on live performance? (quizzes and tests)
Homework would still be assigned as a learning tool, but has no impact on your grade.
That's exactly how scientific courses were in my experience at a university in the US. Curriculum was centered around a textbook. You were expected to do all end of chapter problems and ask questions if you had difficulty. It wasn't graded. No one checked. You just failed the exam if you didn't.
My high school English teacher's book reports were like this. One by one, you come up, hand over your book, and the teacher randomly picks a couple of passages and reads them aloud and asks what had just happened prior and what happens after. Then a couple opinion questions and boom, pass or fail. Fantastic to not write a paper on it; paper writing was a more dedicated topic.
I've heard that's how studying is done in Oxford/Cambridge: https://en.wikipedia.org/wiki/Tutorial_system
That's also how it's done in almost all French engineering schools. You get open book tests with a small amount of relatively difficult questions and you have 3-4 hours to complete.
In some of the CS tests, coding by hand sucks a bit but to be honest, they're ok with pseudo code as long as you show you understand the concepts.
The European mind cannot comprehend take-home exams.
There is no European mind when it comes to education, hell, there is barely a national mind for those countries with federated education systems (e.g. Germany).
Well take home exams are not very useful nowadays with AI. And yeah, other commenters are right when he says there's no European mind when it comes to education, each country does its own thing.
in France I got a bunch of equivalent take-home tests, between high school and graduate level, mostly in math and science. The teacher would give us exercice equivalent to what we'd get in our exams and we'd have one week to complete it (sometimes in pairs) and it'd be graded as part of that semester
I don’t recall take home exams being at all common undergrad in the US. Open book or one page formula sheets more so.
Certainly with maths you’re marked almost totally on written exams, but even if that weren’t true you’re also required to go over example sheets (hard homework questions that don’t form part of the final mark) with a tutor in two-student sessions so it’d be completely obvious if you were relying on AI.
In italy there's an oral in most exams. In math exams you're asked proofs of theorems (that were part of the course).
I really like oral exams on top of regular exams. The teacher can ask questions and dive into specific areas - it'll be obvious who is just using LLMs to answer the questions vs those who use LLMs to tutor them.
Of course, the reasons they do quizes is to optimize the process (need less tutors/examiners), and to remove bias (any tutor holds biases one way or the other).
The tutorial system is just for teaching, not grading. It does keep students honest with themselves about their progress when they’re personally put on the spot once a week in front of one or two of their peers.
The biggest contrast for me between Oxbridge and another red brick was the Oxbridge tutors aren't shy of saying "You've not done the homework, go away and stop wasting my time", whereas the red brick approach was to pick you up and carry you over the finishing line (at least until the hour was up).
Yes, my undergrad degree grade was determined solely by my performance on 8 three-hour exams at the end of the final year.
Funnily enough, the best use of AI in education is to serve as exactly this kind of tutor. This is the future of education.
The promise of the expansion of this kind of tutorial teaching to everyone via AI is great. The problem is keeping students honest with themselves.
At the end of the day you can't force people to learn if they don't want to.
As a society we need to be okay with failing people who deserve to fail and not drag people across the finish line at the expense of diluting the degrees of everyone else who actually put in effort.
I'm not sure why we care about the degree. Employers care about the degree, but they aren't paying for my education.
The students who want to learn, will learn. For the students who just want the paper so they can apply for jobs, we ought to give them their diploma on the first day of class, so they can stop wasting everybody's time.