Lesson 02-WebSocket Protocol Basics

Protocol Handshake Process

The WebSocket protocol handshake is the initial step in establishing a connection. It is based on HTTP and aims to upgrade a standard HTTP connection to a WebSocket connection. Below are the key steps in the handshake process:

Client Initiates Connection Request:

The client sends an HTTP GET request to the server’s designated WebSocket URL (typically starting with ws:// or wss://). This request includes special headers indicating it is a WebSocket connection request. Key headers include, but are not limited to:

  • Upgrade: Must be set to websocket, indicating the client’s intent to upgrade the connection.
  • Connection: Must be set to Upgrade, reinforcing the upgrade request.
  • Sec-WebSocket-Key: Contains a randomly generated Base64-encoded key. The server uses this key, combined with a fixed string, to compute a SHA-1 hash and return a Base64-encoded response for verification.
  • Sec-WebSocket-Version: Specifies the WebSocket protocol version to ensure compatibility between client and server.

Server Response:

Upon receiving the request, if the server agrees to upgrade, it responds with an HTTP response bearing a status code of 101 Switching Protocols, indicating acceptance of the protocol switch. The response includes specific headers:

  • Upgrade: Must also be set to websocket.
  • Connection: Set to Upgrade, confirming the connection upgrade.
  • Sec-WebSocket-Accept: The server computes this value by combining the client’s Sec-WebSocket-Key with a fixed GUID string (258EAFA5-E914-47DA-95CA-C5AB0DC85B11), performing a SHA-1 hash, and Base64-encoding the result. This confirms the handshake request.

Handshake Completion:

Once the client receives the server’s correct response, the handshake is complete, and the HTTP connection transitions to a WebSocket connection. Both parties can then begin sending and receiving WebSocket data frames for full-duplex, real-time communication.

Client-Side (JavaScript):

On the client side, the WebSocket API is typically used to establish a connection. Below is a simple example of initializing a WebSocket connection:

var socket = new WebSocket("ws://example.com/socketserver");

socket.onopen = function(event) {
  console.log("Connection open!");
  socket.send("Hello Server!");
};

socket.onmessage = function(event) {
  console.log("Message from server:", event.data);
};

socket.onerror = function(error) {
  console.error("Error detected: " + error);
};

socket.onclose = function(event) {
  if (event.wasClean) {
    console.log(`[close] Connection closed cleanly, code=${event.code} reason=${event.reason}`);
  } else {
    // e.g., server process killed or network down
    // event.code is usually 1006 in this case
    console.log('[close] Connection died');
  }
};

In this code, new WebSocket(...) initiates the WebSocket connection request, with the handshake process handled automatically by the browser, eliminating the need to manually craft HTTP headers. The onopen, onmessage, onerror, and onclose event handlers correspond to successful connection, receiving messages, errors, and connection closure, respectively.

Server-Side (Node.js + ws library):

On the server side, various libraries can handle WebSocket connections. Here’s an example using the ws library in Node.js:

const WebSocket = require('ws');

const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws) => {
  console.log('Client connected');

  ws.on('message', (message) => {
    console.log(`Received message => ${message}`);
    ws.send(`Hello! You sent -> ${message}`);
  });

  ws.on('close', () => {
    console.log('Client disconnected');
  });

  ws.on('error', (err) => {
    console.error('WebSocket error observed:', err);
  });
});

This code creates a WebSocket server listening on port 8080. When a client connects, the connection event is triggered, allowing the server to receive and send messages. In this example, the server simply echoes the received message back to the client.

Data Frame Format Parsing

WebSocket data transmission relies on a structured format called data frames, enabling efficient and reliable data transfer over the connection. Each data frame consists of a fixed-format header and optional payload data. Below is an overview of the data frame structure:

Data Frame Structure:

  1. Fin: 1 bit, indicates whether this is the final frame of a message. For single-frame messages, this is set to 1.
  2. RSV1, RSV2, RSV3: 1 bit each, typically used for extensions, such as mask bits or compression flags during protocol negotiation. Defaults to 0.
  3. Opcode: 4 bits, defines the frame type. Common values include:
  • 0x0: Continuation frame, for subsequent parts of multi-frame messages.
  • 0x1: Text frame.
  • 0x2: Binary frame.
  • 0x8: Connection close.
  • 0x9: Ping.
  • 0xA: Pong.
  1. Mask: 1 bit, must be set to 1 for client-to-server frames, accompanied by a masking key.
  2. Payload length: Variable-length field indicating the payload data length, encoded differently based on its value.
  3. Masking-key: If the Mask bit is 1, this 4-byte field contains the masking key for decoding the payload.
  4. Payload data: The actual data, which can be text or binary.

Parsing Data Frames on a Simple WebSocket Server:

To parse received data frames, follow these steps:

  1. Read the First Byte: Contains Fin, RSV*, and Opcode information. Use bitwise operations (e.g., AND, shift) to extract these fields.
  2. Check Mask Bit: The highest bit of the second byte indicates masking. If set to 1, a 4-byte masking key follows.
  3. Determine Payload Length: Subsequent bytes or sequences indicate the payload length. If less than 126, use the byte’s value directly; if 126, the next two bytes represent the length (16-bit); if 127, the next eight bytes represent the length (64-bit, rarely used).
  4. Handle Masking Key: If masked, read the next four bytes as the masking key.
  5. Decode Payload Data: Read the data based on the payload length. If masked, decode by XORing each byte with the masking key.
  6. Process Fin and Opcode: Use the Fin bit to determine if more frames are expected; use the Opcode to decide how to handle the frame’s data (e.g., decode text frames as strings, process binary frames directly).

In JavaScript, when using the WebSocket API, you typically don’t need to parse data frames manually, as the API abstracts these details. However, for conceptual understanding, below is a simulated data frame parsing process, assuming you have raw frame bytes (rare in practice but useful for understanding the mechanics).

function parseWebSocketFrame(frameData) {
    let byteIndex = 0;

    // Parse Fin and Opcode
    const fin = (frameData[byteIndex] & 0b10000000) !== 0;
    const opcode = frameData[byteIndex++] & 0b00001111;

    // Parse Mask and Payload Length
    const mask = (frameData[byteIndex] & 0b10000000) !== 0;
    let payloadLength = frameData[byteIndex++] & 0b01111111;

    // Determine actual length based on Payload Length
    if (payloadLength === 126) {
        payloadLength = (frameData[byteIndex++] << 8) | frameData[byteIndex++];
    } else if (payloadLength === 127) {
        // Simplified handling; real applications need to process 64-bit lengths
        throw new Error("Handling of 127-length not implemented.");
    }

    // Retrieve Masking Key (if present)
    let maskingKey;
    if (mask) {
        maskingKey = new Uint8Array([frameData[byteIndex++], frameData[byteIndex++], frameData[byteIndex++], frameData[byteIndex++]]);
    }

    // Extract Payload Data
    const payloadStart = byteIndex;
    const payloadEnd = payloadStart + payloadLength;
    const payloadData = new Uint8Array(frameData.slice(payloadStart, payloadEnd));

    // Decode Payload if Masked
    if (mask) {
        for (let i = 0; i < payloadLength; i++) {
            payloadData[i] ^= maskingKey[i % 4];
        }
    }

    // Process Payload based on Opcode
    switch (opcode) {
        case 0x1:  // Text frame
            return { type: 'text', data: new TextDecoder().decode(payloadData) };
        case 0x2:  // Binary frame
            return { type: 'binary', data: payloadData };
        // ...handle other types
        default:
            throw new Error(`Unsupported opcode: ${opcode.toString(16)}.`);
    }
}

// Example usage (Note: In real WebSocket connections, you won’t directly access frameData)
// let frameData = ...; // Assume this is raw frame data from a WebSocket connection
// try {
//     let parsedFrame = parseWebSocketFrame(frameData);
//     console.log(parsedFrame);
// } catch (e) {
//     console.error(e);
// }

Connection Establishment and Closure Process

The connection establishment and closure processes are critical stages in the WebSocket protocol lifecycle, ensuring stable and reliable bidirectional communication between client and server.

Connection Establishment Process (Handshaking)

Client Initiates Request:

The client sends an HTTP Upgrade request to the server, with headers indicating a WebSocket connection request. Typical headers include:

  • Upgrade: websocket
  • Connection: Upgrade
  • Sec-WebSocket-Key: A Base64-encoded random string for handshake verification.
  • Sec-WebSocket-Version: Specifies the supported WebSocket protocol version.

Server Response:

If the server accepts the connection, it responds with an HTTP status code of 101 Switching Protocols, indicating a successful protocol upgrade. The response headers include:

  • Upgrade: websocket
  • Connection: Upgrade
  • Sec-WebSocket-Accept: A value computed from the client’s Sec-WebSocket-Key for verification.

Handshake Completion:

Upon receiving the correct response, the handshake is complete, and the WebSocket connection is established, allowing both parties to exchange data frames.

Data Transmission

After a successful handshake, the WebSocket connection enters the data transmission phase, where both sides can exchange data via frames. Each frame includes a header describing its type and length, along with optional payload data.

Connection Closure Process

  1. Closure Initiation:

Either party can initiate connection closure by sending a close frame, typically containing a close status code (Close Code) and an optional reason (Close Reason).

  1. Closure Response:

The receiving party responds with a close frame to acknowledge the request, which may also include a status code and reason.

  1. Four-Way Handshake (TCP Level):

Since WebSocket operates over TCP, the actual connection closure involves TCP’s four-way handshake to ensure complete data transmission and proper resource release.

  1. Connection Closed:

After both parties send and receive close frames, the connection is fully closed. Both client and server trigger respective close events (e.g., onclose), allowing applications to perform cleanup tasks.

Close Status Codes (Close Codes)

  • 1000: Normal closure, indicating the connection was closed successfully without errors.
  • 1001: Endpoint is leaving, e.g., page closed.
  • 1002: Protocol error, received data violates protocol format.
  • 1003: Unacceptable data type, received data type is unsupported.
  • 1005: Undefined, no status code provided (typically when no close frame is sent).
  • 1006: Undefined, connection closed unexpectedly (e.g., network failure).
  • 1007: Data frame violates agreed subprotocol.
  • 1008: Invalid data received.
  • 1009: Data frame too large to process.
  • 1010: Client expected an extension not provided by the server.
  • 1011: Connection closed due to server error.
  • 1015: TLS handshake failure.

Understanding the WebSocket connection establishment and closure processes is vital for developing efficient, orderly, and secure real-time communication applications.

Share your love