Skip to main content
Version: 1.1.0

Handle video stream

The video socket from Scrcpy server contains the encoded video frames. Normally, it contains a leading metadata, and multiple configuration and data (video frame) packets, but the exact format depends on the Scrcpy server version and the specified option values.

Format

This table show how versions and options affect the video stream format.

  • ✅ means the field is always present
  • ⛔ means the field is not present
  • An option name means the field is present if the option is true
Valuev1.15 ~ v1.211.22v1.23 ~ v1.25v2.0 ~ v3.1
MetadatasendDeviceMetasendDeviceMetasendDeviceMeta || sendCodecMeta
Device namesendDeviceMetasendDeviceMetasendDeviceMeta
Initial video sizesendDeviceMetasendDeviceMetasendCodecMeta
Video codecsendCodecMeta
ConfigurationsendFrameMetasendFrameMetasendFrameMetasendFrameMeta
Data Packet HeadersendFrameMetasendFrameMetasendFrameMetasendFrameMeta
PTSsendFrameMetasendFrameMetasendFrameMetasendFrameMeta
Keyframe MarksendFrameMetasendFrameMeta

Raw mode

As the above table shows, since v1.22, if sendDeviceMeta and sendFrameMeta options are false, and sendCodecMeta option is also false(since v2.0), all listed fields are not present.

This is called the raw mode. In this mode, the video socket only contains codec-specific, encoded video data.

Without the extra information, it's much harder to process the video stream. Because of that, Tango only has limited support for parsing the video stream in raw mode. Its methods can process the stream without errors, but some fields will be undefined, and built-in video decoders can't decode raw mode video stream.

Video stream metadata

If the server version and options requirements are met, the server will first send some metadata about the device and the video stream:

interface ScrcpyVideoStreamMetadata {
deviceName?: string | undefined;
width?: number | undefined;
height?: number | undefined;
codec: ScrcpyVideoCodecId;
}
  • deviceName: The device's model name.
  • width/height: Size of the first video frame.
  • codec: The codec of the video stream.

Size changes

The metadata will only be sent once. When device screen size changes (for example, when device orientation changes, or a foldable device unfolds), the server will restart the video encoder, but it won't send a new metadata with the new size.

To track the video resolution, parsing the video stream is required. built-in video decoders have a sizeChanged event, and AdbScrcpyClient has screenWidth and screenHeight properties.

With @yume-chan/scrcpy

If you already have a ReadableStream<Uint8Array> that reads from the video socket, the parseVideoStreamMetadata method from the corresponding ScrcpyOptionsX_YY class can be used to parse the metadata. This method will return the metadata, and a new stream that contains the remaining stream.

import { ScrcpyOptions2_1, ScrcpyVideoStreamPacket } from "@yume-chan/scrcpy";

const options = new ScrcpyOptions2_1({
// use the same version and options when starting the server
});

const videoSocket: ReadableStream<Uint8Array>; // get the stream yourself

// Parse video socket metadata
const { metadata: videoMetadata, stream: videoStream } =
await options.parseVideoStreamMetadata(videoSocket);

codec

If metadata is not present, or doesn't contain codec information, the codec field will always be ScrcpyVideoCodecId.H264 because it's the only supported codec(until v2.0)be the same as the videoCodec option(since v2.0).

Raw mode

If the whole metadata is not present, all fields except codec will be undefined. The stream field returned will be the same object as the parameter.

With @yume-chan/adb-scrcpy

See AdbScrcpyClient.prototype.videoStream property below. It uses parseVideoStreamMetadata internally to parse the metadata, and it also parses the video stream into packets.

The codec and raw mode behavior mentioned above also apply.

Video packets

If the server version and options requirements are met, the server will encapsulate each encoded video frame with extra information. Tango parses them into two types of packets:

interface ScrcpyMediaStreamConfigurationPacket {
type: "configuration";
data: Uint8Array;
}

interface ScrcpyMediaStreamDataPacket {
type: "data";
keyframe?: boolean;
pts?: bigint;
data: Uint8Array;
}

type ScrcpyMediaStreamPacket =
| ScrcpyMediaStreamConfigurationPacket
| ScrcpyMediaStreamDataPacket;

Configuration packet

If present, will always be the first packet. However, when the encoder is restarted, a new configuration packet will be generated and sent.

The data field contains the codec-specific configuration information:

  • H.264: Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) in Annex B format.
  • H.265: Video Parameter Set (VPS), Sequence Parameter Set (SPS), and Picture Parameter Set (PPS) in Annex B format.
  • AV1: The first 3 bytes of AV1CodecConfigurationRecord (https://aomediacodec.github.io/av1-isobmff/#av1codecconfigurationbox). The remaining configuration OBUs are in the next data packet.

The client should handle this packet and update the decoder accordingly.

Data packet

Each data packet represents exactly one encoded frame, and if version and options requirements are met, some extra information:

  • keyframe: true if the current packet is a keyframe. Many decoders can decode the video stream without knowing if each frame is a keyframe or not, but some decoders require this information.
  • pts: Presentation timestamp in nanoseconds. When rendering the video in real-time, generally you want to present the decoded frames as they arrive to minimize the latency, but this information can be used to remove processing time deviations when recording.

With @yume-chan/scrcpy

The createMediaStreamTransformer method creates a TransformStream that parses the video stream into packets.

parseVideoStreamMetadata and createMediaStreamTransformer are separate methods, because createMediaStreamTransformer can also be used to parse the audio stream.

const videoPacketStream: ReadableStream<ScrcpyMediaStreamPacket> = videoStream.pipeThrough(
options.createMediaStreamTransformer(),
);

videoPacketStream
.pipeTo(
new WritableStream({
write(packet: ScrcpyMediaStreamPacket) {
switch (packet.type) {
case "configuration":
// Handle configuration packet
console.log(packet.data);
break;
case "data":
// Handle data packet
console.log(packet.keyframe, packet.pts, packet.data);
break;
}
},
}),
)
.catch((e) => {
console.error(e);
});

Similar to options.clipboard, don't await the pipeTo. The returned Promise only resolves when videoSocket ends, but waiting here and not handling other streams will block videoSocket, causing a deadlock.

Raw mode

If data packet header is not present, the keyframe and pts fields will be undefined.

The data field contains the data in one read call, because there is no packet boundaries, it might contain partial or multiple frames.

With @yume-chan/adb-scrcpy

When video option is not false, AdbScrcpyClient.videoStream is a Promise that resolves to an AdbScrcpyVideoStream.

interface AdbScrcpyVideoStream {
stream: ReadableStream<ScrcpyMediaStreamPacket>;
metadata: ScrcpyVideoStreamMetadata;
}

It uses parseVideoStreamMetadata and createMediaStreamTransformer internally, so the return value is a combination of those two methods.

import type { ScrcpyMediaStreamPacket } from "@yume-chan/scrcpy";
import type { AdbScrcpyClient } from "@yume-chan/adb-scrcpy";

declare const client: AdbScrcpyClient;

if (client.videoStream) {
const { metadata: videoMetadata, stream: videoPacketStream } = await client.videoStream;

videoPacketStream
.pipeTo(
new WritableStream({
write(packet: ScrcpyMediaStreamPacket) {
switch (packet.type) {
case "configuration":
// Handle configuration packet
console.log(packet.data);
break;
case "data":
// Handle data packet
console.log(packet.keyframe, packet.pts, packet.data);
break;
}
},
}),
)
.catch((e) => {
console.error(e);
});
}

Decode and render video

Tango provides packages to decode and render video packets in Web browsers:

To use these decoders, the sendFrameMeta options must be true (the default value).

Decoding and playing video outside the browser is out of scope for this library. It will depend on the runtime environment, UI framework, and media library you use.