Is Python Too Slow for Crypto Trading? We Ran the Numbers.

Everyone in crypto trading eventually hears it: "Python is too slow, you need C++ or Rust."

Most people saying it are not wrong. But most people saying it are also not answering the right question.

The right question is not "is Python slow?" It is "is Python slow relative to the thing you're waiting on?" In simple onchain trading, what you're waiting on is a block. And blocks have a hard floor that Python blows past with room to spare on most chains.

But that answer changes the moment you're using Jito bundles on Solana or MEV-Boost bundlers on Ethereum. When you're submitting ordered transaction bundles and competing with other bots for inclusion, you're not racing the block anymore. You're racing every other bot reacting to the same signal. That latency budget is 1-2ms, not 400ms or 12 seconds. And at that point, C++ starts mattering.

We benchmarked all three. Here are the actual numbers.

Setup

All benchmarks for Python and C++ were run on the same laptop with 8GB RAM. This is deliberate worst-case scenario testing. A production trading server with 32GB+ RAM, a faster CPU, and no background processes will show significantly better absolute times.

The relative ratios between implementations for NumPy vs pure Python, C++ vs Pandas — hold across hardware. If Python is fast enough on a consumer laptop, it is fast enough on your server.

C++ compiled with g++ -O2. Pure Python is benchmarked on 10,000 rows and extrapolated linearly (it becomes unusable at 1M). Each test runs 5 times and we take the median. No warm-up tricks, no cherry-picking. Same hardware means the cross-language comparisons are direct.

The C++ setup used for all snippets below:

#include <iostream>
#include <vector>
#include <cmath>
#include <chrono>
#include <numeric>
#include <algorithm>
#include <map>
#include <random>

using namespace std;
using namespace std::chrono;

const int N    = 1000000;
const int RUNS = 5;

template<typename Fn>
double timeit(Fn fn) {
    vector<double> times;
    for (int r = 0; r < RUNS; r++) {
        auto start = high_resolution_clock::now();
        fn();
        auto end   = high_resolution_clock::now();
        times.push_back(duration<double, milli>(end - start).count());
    }
    sort(times.begin(), times.end());
    return times[RUNS / 2]; // median
}

Compile and run with:

g++ -O2 -o benchmark benchmark.cpp && ./benchmark

import time
import math
import random
import statistics
import numpy as np
import pandas as pd

random.seed(42)
np.random.seed(42)

N       = 1_000_000   # 1M ticks — realistic intraday dataset
N_SMALL = 10_000      # pure Python cap before it becomes unusable
RUNS    = 5

def timeit(fn, runs=RUNS):
    times = []
    for _ in range(runs):
        start = time.perf_counter()
        fn()
        end   = time.perf_counter()
        times.append((end - start) * 1000)  # ms
    return statistics.median(times)

# generate synthetic price data
prices_list = [100.0 + random.gauss(0, 1) for _ in range(N)]
prices_small = prices_list[:N_SMALL]
prices_np    = np.array(prices_list, dtype=np.float64)
prices_pd    = pd.Series(prices_list)

Benchmark 1: Log Returns

Log returns (log(p_t / p_{t-1})) are the most fundamental operation in any quant pipeline. This is what you're feeding into almost every ML model and strategy signal.

# Pure Python
def log_returns_pure():
    return [math.log(prices_small[i] / prices_small[i-1])
            for i in range(1, len(prices_small))]

# NumPy
def log_returns_numpy():
    return np.diff(np.log(prices_np))

# Pandas
def log_returns_pandas():
    return np.log(prices_pd / prices_pd.shift(1)).dropna()

t_pure = timeit(log_returns_pure)
t_np   = timeit(log_returns_numpy)
t_pd   = timeit(log_returns_pandas)

print(f"Pure Python (10K, extrap to 1M): {t_pure * 100:.1f} ms")
print(f"NumPy  (1M rows): {t_np:.2f} ms")
print(f"Pandas (1M rows): {t_pd:.2f} ms")

Results on 1M rows:

Implementation	Time	vs Pure Python
Pure Python (extrapolated)	~78.8 ms	baseline
NumPy	4.80 ms	16x faster
Pandas	8.78 ms	9x faster

C++ equivalent:

auto log_returns = [&]() {
    vector<double> ret(N - 1);
    for (int i = 1; i < N; i++)
        ret[i-1] = log(prices[i] / prices[i-1]);
    return ret;
};
// C++ result: 4.77 ms — essentially the same as NumPy (4.80 ms)
// NumPy's C backend is competitive with hand-written C++ for
// parallel operations like this

Benchmark 2: Rolling Mean (20-period)

WINDOW = 20

# Pure Python
def rolling_mean_pure():
    result = []
    for i in range(WINDOW - 1, len(prices_small)):
        result.append(sum(prices_small[i - WINDOW + 1:i + 1]) / WINDOW)
    return result

# NumPy
def rolling_mean_numpy():
    kernel = np.ones(WINDOW) / WINDOW
    return np.convolve(prices_np, kernel, mode='valid')

# Pandas
def rolling_mean_pandas():
    return prices_pd.rolling(WINDOW).mean().dropna()

t_pure = timeit(rolling_mean_pure)
t_np   = timeit(rolling_mean_numpy)
t_pd   = timeit(rolling_mean_pandas)

Results on 1M rows:

Implementation	Time	vs Pure Python
Pure Python (extrapolated)	~304 ms	baseline
NumPy	6.69 ms	45x faster
Pandas	18.09 ms	17x faster

C++ equivalent — sliding window, O(n):

const int WINDOW = 20;

auto rolling_mean = [&]() {
    vector<double> result(N - WINDOW + 1);
    double window_sum = 0;
    for (int i = 0; i < WINDOW; i++) window_sum += prices[i];
    result[0] = window_sum / WINDOW;
    for (int i = WINDOW; i < N; i++) {
        window_sum += prices[i] - prices[i - WINDOW];
        result[i - WINDOW + 1] = window_sum / WINDOW;
    }
    return result;
};
// C++ result: 1.47 ms — 4.5x faster than NumPy (6.69 ms)

Benchmark 3: EMA — Where NumPy Loses

This one is instructive. EMA has a recursive dependency: each value depends on the previous one. That means it cannot be vectorized with standard NumPy operations. A NumPy loop is actually slower than pure Python because of array indexing overhead on each iteration.

ALPHA = 0.1

# Pure Python
def ema_pure():
    ema = [prices_small[0]]
    for p in prices_small[1:]:
        ema.append(ALPHA * p + (1 - ALPHA) * ema[-1])
    return ema

# NumPy loop — looks fast, is not
def ema_numpy():
    out    = np.empty(len(prices_np))
    out[0] = prices_np[0]
    for i in range(1, len(prices_np)):
        out[i] = ALPHA * prices_np[i] + (1 - ALPHA) * out[i-1]
    return out

# Pandas ewm — uses a C extension internally
def ema_pandas():
    return prices_pd.ewm(alpha=ALPHA, adjust=False).mean()

t_pure = timeit(ema_pure)
t_np   = timeit(ema_numpy)
t_pd   = timeit(ema_pandas)

Results on 1M rows:

Implementation	Time	vs Pure Python
Pure Python (extrapolated)	~62.9 ms	baseline
NumPy loop	384.58 ms	6x slower
Pandas ewm	9.06 ms	7x faster

C++ equivalent — this is where C++ reclaims the advantage:

const double ALPHA = 0.1;

auto ema = [&]() {
    vector<double> result(N);
    result[0] = prices[0];
    for (int i = 1; i < N; i++)
        result[i] = ALPHA * prices[i] + (1.0 - ALPHA) * result[i-1];
    return result;
};
// C++ result: 3.30 ms — 2.7x faster than Pandas ewm (9.06 ms)
// NumPy loop: 384.58 ms — C++ is 116x faster

The NumPy loop is 6x slower than pure Python because accessing individual array elements inside a Python loop carries more overhead than working with native Python lists. Pandas ewm wins because it calls into a compiled C extension.

The lesson: NumPy is not always the answer. Pandas' specialized methods (ewm, rolling, groupby) often outperform naive NumPy because they route through optimized C code.

Benchmark 4: OHLCV Aggregation from Ticks

Aggregating raw ticks into block-level OHLCV candles is the first thing any onchain pipeline does. Here we simulate 100 ticks per block across 10,000 blocks.

ticks = pd.DataFrame({
    'block':  np.repeat(np.arange(N // 100), 100),
    'price':  prices_np,
    'volume': np.abs(np.random.randn(N)) * 1000
})

# Pure Python
def ohlcv_pure():
    result = {}
    for row in ticks[:N_SMALL].itertuples():
        b = row.block
        if b not in result:
            result[b] = {'open': row.price, 'high': row.price,
                         'low': row.price, 'close': row.price, 'volume': 0}
        result[b]['high']   = max(result[b]['high'], row.price)
        result[b]['low']    = min(result[b]['low'],  row.price)
        result[b]['close']  = row.price
        result[b]['volume'] += row.volume
    return result

# Pandas groupby
def ohlcv_pandas():
    return ticks.groupby('block').agg(
        open=('price',  'first'),
        high=('price',  'max'),
        low=('price',   'min'),
        close=('price', 'last'),
        volume=('volume', 'sum')
    )

t_pure = timeit(ohlcv_pure)
t_pd   = timeit(ohlcv_pandas)

Results on 1M rows:

Implementation	Time	vs Pure Python
Pure Python (extrapolated)	~908 ms	baseline
Pandas groupby	35.65 ms	25x faster

C++ equivalent:

struct OHLCV { double open, high, low, close, volume; };

auto ohlcv = [&]() {
    map<int, OHLCV> result;
    for (int i = 0; i < N; i++) {
        int block = i / 100;
        if (result.find(block) == result.end())
            result[block] = {prices[i], prices[i], prices[i], prices[i], 0};
        auto& c  = result[block];
        c.high   = max(c.high, prices[i]);
        c.low    = min(c.low,  prices[i]);
        c.close  = prices[i];
        c.volume += volume[i];
    }
    return result;
};
// C++ std::map result: 55.73 ms
// Note: std::map uses O(log n) lookup — Pandas groupby wins at 35.65 ms
// Switching to unordered_map brings C++ down to ~8ms
// Rare case where Pandas beats naive C++

Benchmark 5: Rolling Z-Score Normalization

This is what you're running before feeding data into any ML model.

ZWINDOW = 30

# Pure Python
def zscore_pure():
    result = []
    for i in range(ZWINDOW - 1, len(prices_small)):
        window = prices_small[i - ZWINDOW + 1:i + 1]
        mu     = sum(window) / ZWINDOW
        std    = math.sqrt(sum((x - mu)**2 for x in window) / ZWINDOW)
        result.append((prices_small[i] - mu) / (std + 1e-9))
    return result

# NumPy strided windows
def zscore_numpy():
    shape   = (len(prices_np) - ZWINDOW + 1, ZWINDOW)
    strides = (prices_np.strides[0], prices_np.strides[0])
    windows = np.lib.stride_tricks.as_strided(prices_np, shape=shape, strides=strides)
    means   = windows.mean(axis=1)
    stds    = windows.std(axis=1)
    return (prices_np[ZWINDOW-1:] - means) / (stds + 1e-9)

# Pandas rolling
def zscore_pandas():
    roll = prices_pd.rolling(ZWINDOW)
    return (prices_pd - roll.mean()) / roll.std()

t_pure = timeit(zscore_pure)
t_np   = timeit(zscore_numpy)
t_pd   = timeit(zscore_pandas)

Results on 1M rows:

Implementation	Time	vs Pure Python
Pure Python (extrapolated)	~3,477 ms	baseline
NumPy strided	150.55 ms	23x faster
Pandas rolling	41.66 ms	83x faster

C++ equivalent:

const int ZWINDOW = 30;

auto zscore = [&]() {
    vector<double> result(N - ZWINDOW + 1);
    for (int i = ZWINDOW - 1; i < N; i++) {
        double sum = 0, sq_sum = 0;
        for (int j = i - ZWINDOW + 1; j <= i; j++) {
            sum    += prices[j];
            sq_sum += prices[j] * prices[j];
        }
        double mu  = sum / ZWINDOW;
        double std = sqrt(sq_sum / ZWINDOW - mu * mu);
        result[i - ZWINDOW + 1] = (prices[i] - mu) / (std + 1e-9);
    }
    return result;
};
// C++ result: 6.11 ms — 6.8x faster than Pandas (41.66 ms)
// NumPy strided: 150.55 ms — C++ is 24.6x faster

The Number That Actually Matters: Block Time

Here is where the Python vs C++ debate either becomes relevant or completely irrelevant depending on which chain you're trading.

block_times_ms = {
    "Ethereum": 12_000,   # ~12 seconds
    "Polygon":   2_000,   # ~2 seconds
    "BSC":       3_000,   # ~3 seconds
    "Solana":      397,   # ~400ms
}

# Best-case Python: NumPy log returns on 1M rows
best_python_ms = timeit(log_returns_numpy)  # 3.36 ms

for chain, block_ms in block_times_ms.items():
    pct = (best_python_ms / block_ms) * 100
    print(f"{chain}: Python uses {pct:.2f}% of your block time")

Results:

Chain	Block Time	NumPy signal (1M rows)	Python as % of block	Margin left
Ethereum	12,000 ms	4.80 ms	0.04%	11,995 ms
Polygon	2,000 ms	4.80 ms	0.24%	1,995 ms
BSC	3,000 ms	4.80 ms	0.16%	2,995 ms
Solana	400 ms	4.80 ms	1.20%	395 ms

Now run the same comparison for pure Python (extrapolated from 10K benchmark):

Chain	Block Time	Pure Python (1M rows)	Python as % of block
Ethereum	12,000 ms	~78.8 ms	0.66%
Polygon	2,000 ms	~78.8 ms	3.94%
BSC	3,000 ms	~78.8 ms	2.63%
Solana	400 ms	~78.8 ms	19.7%

What the Numbers Are Actually Saying

On Ethereum, BSC, and Polygon: Even badly written pure Python is consuming under 6% of your available block time for a 1M-row signal computation. NumPy drops that to under 0.2%. Python is not your bottleneck.

On Solana: Pure Python is consuming nearly 20% of a block window on a 1M-row computation. That starts to matter. If you're running multiple signals, preprocessing, and model inference in the same loop, you can run out of margin. NumPy brings that back down to under 1%, which is fine. But this is where poorly written Python actually costs you trades.

The EMA case shows something important: the "use NumPy for everything" instinct is wrong. NumPy loops are slower than pure Python because of element-access overhead. For operations with recursive dependencies (EMA, cumulative sums, any state-carrying computation), Pandas' specialized methods beat both because they call into compiled C extensions. Knowing which tool to reach for matters more than defaulting to NumPy everywhere.

When Block Time Is Not Your Actual Budget: Jito and MEV Bundlers

The block time argument breaks down once you're using infrastructure designed to order transactions within a block.

Jito on Solana is a modified validator client that accepts bundles of transactions with attached tips. Bots submit bundles to Jito block engines, which order them by tip size. If your signal fires and you take 80ms to compute and submit your bundle, every other bot that reacted in 20ms is ahead of you in the queue regardless of the 400ms block window. The effective latency budget for competitive Jito strategies is closer to 20-50ms from signal to bundle submission.

MEV-Boost on Ethereum works similarly. Searchers submit transaction bundles to block builders (via relays like Flashbots, BloXroute, Titan). Builders sort and include bundles before building the block. Your Python signal computation time is part of the loop from "event observed" to "bundle submitted." At 12-second blocks, this feels comfortable, but competitive searchers are reacting in milliseconds, and late bundles get crowded out by faster ones bidding higher.

In both cases the question shifts from "am I faster than a block" to "am I faster than the other bots."

What this means for C++ vs Python:

Operation	Best Python (Pandas/NumPy)	C++ (-O2)	Difference
Log returns	4.80 ms (NumPy)	4.77 ms	~same
Rolling mean	6.69 ms (NumPy)	1.47 ms	4.5x faster
EMA	9.06 ms (Pandas)	3.30 ms	2.7x faster
Rolling z-score	41.66 ms (Pandas)	6.11 ms	6.8x faster

A full signal pipeline in Python (several operations chained) might take 30-60ms. The same pipeline in C++ runs in 5-15ms. At Jito tip auction speeds, that 20-50ms gap is meaningful. It is not the difference between "works" and "doesn't work" on Ethereum, but on Solana with Jito it can be the difference between landing in the bundle and getting crowded out.

The honest answer: Python with good libraries is sufficient for most onchain strategies. C++ becomes worth the complexity cost specifically when you're running competitive MEV or Jito strategies where you're directly timing out against other bots in the same bundle auction.

Summary Table

Operation	Pure Python	Best Python lib	C++ (-O2)	C++ vs best Python
Log returns	~78.8 ms	4.80 ms (NumPy)	4.77 ms	~same
Rolling mean (20)	~304 ms	6.69 ms (NumPy)	1.47 ms	4.5x faster
EMA (α=0.1)	~62.9 ms	9.06 ms (Pandas)	3.30 ms	2.7x faster
OHLCV aggregation	~908 ms	35.65 ms (Pandas)	55.73 ms (map) ⚠️	Pandas wins here
Rolling z-score (30)	~3,477 ms	41.66 ms (Pandas)	6.11 ms	6.8x faster

⚠️ C++ std::map uses O(log n) lookup — switching to unordered_map brings OHLCV down to ~8ms and reclaims the lead. Pandas wins with the naive map implementation.

All benchmarks run on the same 8GB RAM — worst case. g++ -O2. Cross-language comparisons are direct.

When Python Actually Is Too Slow

There are two cases where the claim holds without qualification:

Sub-block execution on Solana. If you're trying to land a transaction in the next slot (400ms), you cannot afford 20% of that window on signal computation with pure Python. Write NumPy or move the hot path to a compiled language.

Co-location and HFT-style strategies. If you're competing on nanosecond latency against C++ market makers on a centralized exchange, Python loses regardless of how well you write it.

Large model inference in the critical path. Running a PyTorch model inside your execution loop on every tick is a different problem from signal computation. Model inference latency varies wildly and deserves its own benchmark.

The Actual Takeaway

Write clean NumPy and Pandas for your signal pipeline. Use Pandas' specialized methods (ewm, rolling, groupby) rather than defaulting to NumPy loops for everything. Profile before you optimize.

All benchmarks run on a laptop with 8GB RAM — worst-case scenario testing by design. A production server will beat these numbers across the board. The relative ratios between implementations hold regardless of hardware.

Trading infrastructure questions? Reach out.

Setup​

Benchmark 1: Log Returns​

Benchmark 2: Rolling Mean (20-period)​

Benchmark 3: EMA — Where NumPy Loses​

Benchmark 4: OHLCV Aggregation from Ticks​

Benchmark 5: Rolling Z-Score Normalization​

The Number That Actually Matters: Block Time​

What the Numbers Are Actually Saying​

When Block Time Is Not Your Actual Budget: Jito and MEV Bundlers​

Summary Table​

When Python Actually Is Too Slow​

The Actual Takeaway​

Setup

Benchmark 1: Log Returns

Benchmark 2: Rolling Mean (20-period)

Benchmark 3: EMA — Where NumPy Loses

Benchmark 4: OHLCV Aggregation from Ticks

Benchmark 5: Rolling Z-Score Normalization

The Number That Actually Matters: Block Time

What the Numbers Are Actually Saying

When Block Time Is Not Your Actual Budget: Jito and MEV Bundlers

Summary Table

When Python Actually Is Too Slow

The Actual Takeaway