A CUDA core is one of the tiny math units inside an NVIDIA GPU that does the grunt work for graphics and parallel compute. Each core lives inside a larger block called a Streaming Multiprocessor (SM), and on modern GeForce “Blackwell” GPUs each SM contains 128 CUDA cores. That’s why you’ll see total counts like 21,760 CUDA cores on an RTX 5090. The chip simply has many SMs, each packed with those cores.
CUDA (NVIDIA’s parallel computing platform) is the software side of the story: it lets apps and frameworks send massively parallel work rendering, AI, simulation to those cores efficiently
Think of a GPU like a factory designed for bulk jobs. CUDA cores handle work in warps groups of 32 threads that execute the same instruction on different data (a model NVIDIA calls SIMT). This is how GPUs chew through thousands of operations at once. Each SM has schedulers that keep many warps in flight to hide memory latency and keep those cores busy.
A useful mental picture:
These offload specific tasks so CUDA cores can focus on shading/compute
Image Credit: NVIDIA
Usually but not by themselves. Architecture matters a lot. For example, NVIDIA’s Ampere generation doubled FP32 throughput per SM versus Turing, so “per-core” power changed between generations. Ada also greatly expanded caches (notably L2), which boosts many workloads without changing core counts. In short: comparing CUDA-core counts across different generations isn’t apples-to-apples.
Other big swing factors:
A friendly rule of thumb:
If you want a quick sanity check on scale, RTX 5090 lists 21,760 CUDA cores, showing how NVIDIA tallies per-SM cores across many SMs. But again, performance gains come from the total design, not the count alone.
Image Credit: NVIDIA
You don’t need a special cable, but you do need the right software stack. CUDA is NVIDIA’s platform; apps use it through drivers, toolkits, and libraries. Many popular applications and frameworks are already built to tap CUDA acceleration once your NVIDIA drivers and (when needed) the CUDA Toolkit are installed, supported apps just…use it
CUDA runs on CUDA‑enabled NVIDIA GPUs across product lines (GeForce/RTX for gaming and creation, professional RTX, and data‑center GPUs). The programming guide notes the model scales across many GPU generations and SKUs; NVIDIA maintains a list of CUDA‑enabled GPUs and their compute capabilities.
Is a CUDA core the same as a “shader core”?
In everyday GPU talk, yes on NVIDIA GPUs, “CUDA cores” refer to the programmable FP32/INT32 ALUs used for shading and general compute inside each SM.
Why are CUDA-core numbers so different across generations?
Because architectures evolve. Ampere changed FP32 datapaths (more work per clock), and Ada overhauled caches so performance doesn’t scale linearly with core count.
What’s a warp again?
A group of 32 threads that execute in lock‑step on the SM. Apps launch thousands of threads; the GPU schedules them as warps to keep hardware busy.
Do CUDA cores help with AI?
Yes, but the big accelerators for modern AI are Tensor Cores. CUDA cores still handle lots of surrounding work in those pipelines.