mouse subtick updates

In this article, we're going to talk about mouse subtick updates in a client-server setup. The main point is this: when you have a monitor that can render at 144Hz or higher, updating the view (i.e. the camera's yaw and pitch) at this high rate allows the game to fully take advantage of the display’s refresh capabilities.

However, in a traditional client-server game setup, state updates from the server typically arrive at 60Hz or lower. If we relied solely on these updates to adjust the camera, then we’d be limited to 60 updates per second — far below what high refresh rate monitors are capable of displaying.

In this article, we’ll describe a simple but powerful solution using client prediction and server reconciliation — concepts commonly used in multiplayer games. Our approach focuses on local responsiveness, specifically how we process mouse movement on the client side.

Mouse Movement Processing

We assume that mouse input on the client arrives as a sequence of 2D screen positions:

(x₁, y₁), (x₂, y₂), …

To compute how the camera should rotate, we calculate mouse deltas between each pair of consecutive positions:

Δx = x₂ - x₁
Δy = y₂ - y₁

These deltas are scaled by a sensitivity factor s to yield changes in yaw and pitch:

Δyaw = s × Δx
Δpitch = s × Δy

Single-Player Case

In a single-player scenario, suppose the user moves the mouse from point a = (a₁, a₂) to point b = (b₁, b₂) over 1 second. If the system samples mouse input at 5Hz, we would get a sequence of intermediate points:

a, p₁, p₂, p₃, b

Then, the total camera rotation over that time would be computed as:

Δyaw_total = s × [(p₁ - a) + (p₂ - p₁) + (p₃ - p₂) + (b - p₃)]

Notice this is a telescoping sum, and all intermediate terms cancel out, leaving:

Δyaw_total = s × (b - a)

This means that no matter how frequently we sample the input, the total camera movement is the same over the same time period. In other words, higher sampling just gives us smoother interpolation, not fundamentally different results.

Subtick Mouse Updates in Client-Server Games

The insight from the single-player case applies directly to a client-server setup. On the client, we update the view immediately using incoming mouse positions. Each mouse movement results in a change in yaw and pitch, which directly affects the camera's orientation — and we apply these updates as frequently as possible.

Meanwhile, the client sends mouse positions to the server at a slower, coarser rate. Suppose we send the Nth mouse position: (x, y). This gets processed on the server, which uses it to update the server-side view of the character. The server then sends a response back to the client containing:

When the client receives this correction, it performs a reconciliation step. First, it sets its yaw and pitch to the values received from the server. Then, it re-applies all mouse deltas that occurred after the Nth position — i.e., all updates that the server hasn't seen yet.

The goal is for the resulting yaw and pitch on the client to match what was predicted before the correction. This will happen as long as the yaw/pitch delta computed from the Nth update was the same on both the client and the server.

Here's why that's true: the yaw pitch delta is computed from mouse movement using:

Δyaw = s × (x₂ - x₁)
Δpitch = s × (y₂ - y₁)

Even if the client processes many fine-grained intermediate updates between two points a and b, the total effect is:

s × (b - a)

— exactly the same as what the server computes with its coarser sampling. Because this computation is additive and telescopes to the same result, both the client and server end up with the same yaw/pitch after processing the Nth update.

Therefore, when the client re-applies the deltas from mouse updates after N, the final yaw and pitch match the original prediction. There's no discrepancy between predicted and reconciled state.

Subtick mouse updates are not only possible — they're surprisingly simple to implement. By running local, high-frequency input updates and syncing with the server using coarser snapshots, we achieve both high responsiveness and server-authoritative correctness, with minimal overhead.

understanding it in practice

We will visually explore how this algorithm works in practice to help make the concepts described here more concrete.


   m1  m2  m3  m4  m5  m6  m7  m8  m9  m10 m11 m12 m13 m14 m15 m16 m17 m18 m19
   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .       (client ticks)
---|-----------------------|-----------------------|-----------------------| 
   \    ^                  \    ^                  \    ^       
    \  /                    \  /                    \  /       
     \/                      \/                      \/       
     /\                      /\                      /\      
    /  \                    /  \                    /  \    
   /    \                  /    \                  /    \  
  /      \                /      \                /      \
 /        \              /        \              /        \
/          v            /          v            /          v
|-----------------------|-----------------------|-----------------------| 
GU1:(sy1, sp1)          GU2:(sy2, sp2)          GU3:(sy3, sp3)           (s[y/p]x: server [yaw/pitch] x)

On the top we have the client ticking along with mouse updates of the form mx: (sx, sy) where these are screen coordinates, recall that whenever a new one is recorded we compute a mouse delta by subtracting the two, then we scale this by a sensitivity factor to produce a mouse yaw pitch, and then add that to the current yaw pitch of the camera. Also lets assume that there is a camera object and it has a function called update which takes in a mouse screen position and updates all that internally. In this light what we've discovered previously is that given a bunch of screen positions recorded s1, s2, s3, s4, s5 (these are tuples), then the state of the camera (it's yaw pitch) will be the same whether we do:


camera.update(s1);
camera.update(s5);

or if we do


camera.update(s1);
camera.update(s2);
camera.update(s3);
camera.update(s4);
camera.update(s5);

We call call this the TELESCOPING PROPERTY

Now lets explore why our algorithm works, on the client lets assume that after processing mx: (sx, yx) it produces a yaw pitch in the camera of (yx, px). When we receive GU3 our yaw pitch was (y14, p14) but that game update contained we just received (sy3, sp3) and we clobber our current yaw pitch on the client with that. Additionally in that game update there was a mouse update number specifying what the last processed mouse pos update number was which in this case would be m7, therefore all the mouse updates that were simulated on the client since that one which would be: m8, m9, m10, m11, m12, m13, m14 and we reapply those on the camera using the update function, we obtain some resulting yaw pitch called (y, p), and if everything went right we want it to be that (y, p) = (y14, p14), in the code we're trying to compare if the camera's state would be equal whether we did A or B

A: RECONCILED


camera.set_yaw_pitch(sy3, sp3);
camera.update(m8);
camera.update(m9);
camera.update(m10);
camera.update(m11);
camera.update(m12);
camera.update(m13);
camera.update(m14);

B:


camera.set_yaw_pitch(sy2, sp2)  |
camera.update(m2);              |
camera.update(m3);              |
camera.update(m4);              | PREV RECONCILIATION
camera.update(m5);              | 
camera.update(m6);              |
camera.update(m7);              | 
camera.update(m8);              
camera.update(m9);
camera.update(m10);
camera.update(m11);
camera.update(m12);
camera.update(m13);
camera.update(m14);

Since both share the same updating procedures for m8-m14 then these states are only equal if A' and B' produce the same camera state

A': RECONCILED


camera.set_yaw_pitch(sy3, sp3);

B':


camera.set_yaw_pitch(sy2, sp2) 
camera.update(m3);              
camera.update(m4);              
camera.update(m5);              
camera.update(m6);              
camera.update(m7);              

This is easier to understand, the pair (sy3, sp3) was produced on the server, the sequence of mouse updates received by the server over the whole interaction up until this point was m1 and m7, therefore (sy3,sp3) was the state of the camera after the following sequence of events:


Camera camera; (initializes with a yaw pitch of (0, 0), and m0 = (0, 0))
camera.update(m1) <- This produces (sy2, sp2)
camera.update(m7) <- This produces (sy3, sp3)

Thus we can re-write again to this form:

A'': RECONCILED


Camera camera; 
camera.update(m1) 
camera.update(m7) 

B'':


Camera camera; 
camera.update(m1);
camera.update(m2);
camera.update(m3);              
camera.update(m4);              
camera.update(m5);              
camera.update(m6);              
camera.update(m7);              

But then by the telescoping property we know that B'' is equivalent to:


Camera camera; 
camera.update(m1);
camera.update(m7);              
Thus A'' and B'' are equivalent so that A' and B' are quivalent and thus A and B are equivalent (aka produce the same camera state).

Therefore we can deduce that the camera state after the sequence events A is the same as the one after the sequence of events B, therefore we know that (y, p) = (y14, p14) meaning that after reconcilation in theory there is no delta and the camera does not twitch or anything. We've only proved this holds true for a small example, but really what we've done is shown that when client and server have a shared yaw pitch value, that in the future their yaw pitch value will continue matching up, what's really going on here is that we've proven a sort of base case, where both of them start with a yaw pitch of (0, 0), and that after reconcilation client and server still match, thus by induction this pattern should keep occuring over and over.

hitscan and subtick updates

Now that we understand how we can use subtick mouse updates to make the camera smooth, we'll now actually talk about how we can use subtick to give players an actual edge in-game.

Note that if we allow subtick updates to give players an edge in-game, then your game is slightly more pay-to-win than if not, but I actually don't think that matters too much. If you're going to invest in hardware such as a monitor to get more in-game frames, then squeezing out as much value out of that investment seems like a good idea. So in my mind, this isn't actually a pay-to-win feature, but rather a bang-for-your-buck feature.

Additionally, in the realm of the toolbox engine, we already can get 300+ fps on old ass hardware, and so you don't even need a great monitor to feel benefits. Actually, even if you had a 60 fps monitor but your game ran at like 144 fps and your mouse's poll rate was at least that, you'd be getting more accurate shots. Your brain would just have to interpolate between the frames a little more, but the game would actually have a higher precision when firing.

Ok, so with that out of the way let's get into how this would actually work. First, let's go back to our previous diagram and see how firing fits into all of this. First of all, on your average server, if you have 50ms of ping then if we didn't do any sort of client work, shots are going to feel delayed. It's already been seen that humans can detect at least 13ms of input lag—for me I can detect like 9ms of lag—so 50ms of lag is going to be noticeable. In order to correct for this, we need to do some aspect of hitscan on the client to make it feel like we're firing straight away.


   m1  m2  m3  m4  m5  m6  m7  m8  m9  m10 m11 m12 m13 m14 m15 m16 m17 m18 m19
   .   .   .   .   .   .   .   .   .   x   x   x   .   .   .   .   .   .   .       (client ticks, now x represents fire is clicked)
---|-----------------------|-----------------------|-----------------------| 
   \    ^                  \    ^                  \    ^       
    \  /                    \  /                    \  /       
     \/                      \/                      \/       
     /\                      /\                      /\      
    /  \                    /  \                    /  \    
   /    \                  /    \                  /    \  
  /      \                /      \                /      \
 /        \              /        \              /        \
/          v            /          v            /          v
|-----------------------|-----------------------|-----------------------| 
GU1:(sy1, sp1)          GU2:(sy2, sp2)          GU3:(sy3, sp3)           (s[y/p]x: server [yaw/pitch] x)

NOTE: I actually decided I don't need to work on this component of this yet, because I want to test and see if this is something people would actually complain about. Additionally, I have played 3k+ hours of TF2 and it's fine, and it doesn't have this feature: https://www.youtube.com/watch?v=EKmdTwQReFQ and https://www.youtube.com/watch?v=RmuoWDzVZ9U

If I do come back to this then this is the general idea:

If a fire event is detected locally, you run that and play a hitsound, then you compute like "how far" you were in terms of mouse pos updates to the server when you fire, and then that is used on the server to do a more accurate shot.


edit this page