Skip to main content
Version: 1.1.0

Save to file

While it's always possible to decode and re-encode the video stream into any format and configuration you want, it's more simple and efficient to use the video stream directly. Just note that there are upside and downside of each method:

  • Codec and configuration: Re-encode the video allows you to use any codec and configuration. Because PC is often more powerful than mobile devices, it might be able to achieve the same quality with less bitrate (thus smaller files). When muxing the video stream, the codec and configuration is always same as captured.
  • Quality: Most video encoding processes are lossy. The quality of re-encoded video is lower than the original video. When muxing the video stream, the quality is unchanged.
  • Processing power and memory: Re-encode the video requires more CPU/GPU power and memory. Muxing a video requires almost no CPU power and memory.
  • Code complexity: Because the video resolution can change with device orientation, it's more complex to re-encode the video. Muxing the video stream doesn't require any additional code to handle that.

The video stream contains only encoded video frames in varies codecs (H.264/AVC, H.265/HEVC, and AV1). A few video players (e.g. VLC) can play those raw streams directly, but most others require a container format (e.g. MP4): the video stream plus some metadata.

To convert the video stream into a video file (muxing), you need a muxer library, like mp4-muxer and webm-muxer.

note

The APIs of mp4-muxer and webm-muxer are designed around WebCodecs API. However, WebCodecs API is not required when you already have encoded video stream, like we do.

No matter which container and muxer library you use, there are some codec-specific conversions that are required.

H.264

H.264/AVC is a joint effect between ITU-T and ISO, thus it has two names. The words "H.264" and "AVC" are usually used interchangeably. However, they are actually two different specifications. The packet format and algorithm to decode them are identical, but how to store those packets in files is different:

  • ITU-T H.264 standard (https://www.itu.int/rec/T-REC-H.264) defines a bit stream for storing H.264 packets. This format doesn't have an official name, but since it's in the Annex B section of the specification, it's commonly referred to as "Annex B format", and used to store raw H.264 streams.
  • ISO/IEC MPEG-4 AVC standard (https://www.iso.org/obp/ui/en/#iso:std:iso-iec:14496:-10:ed-10:v1:en) uses varies C structure-like format to store those packets. This format is commonly referred to as "AVC" or "AVCC", and used by various containers.

Android MediaCodec API (which Scrcpy uses) produces video streams in Annex B format. To save the stream into a container format, it needs to be converted to AVC format.

Configuration

As mentioned in configuration packet, H.264 configuration packet contains the Sequence Parameter Set (SPS) and Picture Parameter Set (PPS), which contains information like codec profile, resolution, cropping, framerate, etc.

The configuration packets need to be store in two places:

  • In the video metadata, as a AVCDecoderConfigurationRecord structure
  • Prepended to the next frame data, in the video stream.
info

Configuration packets can occur multiple times in the stream, when the encoder is restarted. ONLY the first one needs to be converted to AVCDecoderConfigurationRecord and stored in metadata. But ALL of them must be prepended to their next frame data.

Subsequent configuration packets can be different from the first one, even including different resolution or codec profile. Most players can handle this correctly.

If frame metadata is enabled, the SPS and PPS will be in the configuration packets. Tango has a method to extract them from the configuration packet. If they can't be found in the specified buffer, an error will be thrown.

import { h264SearchConfiguration } from "@yume-chan/scrcpy";

for await (const packet of videoPacketStream) {
if (packet.type === "configuration") {
const { sequenceParameterSet, pictureParameterSet } =
h264SearchConfiguration(packet.data);
console.log(sequenceParameterSet, pictureParameterSet);
}
}

Then you can use this function to convert them into AvcDecoderConfigurationRecord:

// https://ffmpeg.org/doxygen/0.11/avc_8c-source.html#l00106
function h264ConfigurationToAvcDecoderConfigurationRecord(
sequenceParameterSet: Uint8Array,
pictureParameterSet: Uint8Array,
) {
const buffer = new Uint8Array(
11 + sequenceParameterSet.byteLength + pictureParameterSet.byteLength,
);
buffer[0] = 1;
buffer[1] = sequenceParameterSet[1]!;
buffer[2] = sequenceParameterSet[2]!;
buffer[3] = sequenceParameterSet[3]!;
buffer[4] = 0xff;
buffer[5] = 0xe1;
buffer[6] = sequenceParameterSet.byteLength >> 8;
buffer[7] = sequenceParameterSet.byteLength & 0xff;
buffer.set(sequenceParameterSet, 8);
buffer[8 + sequenceParameterSet.byteLength] = 1;
buffer[9 + sequenceParameterSet.byteLength] = pictureParameterSet.byteLength >> 8;
buffer[10 + sequenceParameterSet.byteLength] = pictureParameterSet.byteLength & 0xff;
buffer.set(pictureParameterSet, 11 + sequenceParameterSet.byteLength);
return buffer;
}

How to use the above AvcDecoderConfigurationRecord data depend on the container format and muxer library. Usually there will be a metadata, configuration, or description field for the video stream.

Frames

When frame metadata is enabled, each data packet contains exactly one encoded frame.

For the first data packet after a configuration packet, the configuration data need to be prepended to the frame data.

info

It needs the original Annex B format data, not the AvcDecoderConfigurationRecord.

let configuration: Uint8Array | undefined;

for await (const packet of videoPacketStream) {
if (packet.type === "configuration") {
configuration = packet.data;
// Also convert it to `AVCDecoderConfigurationRecord` and store in metadata
continue;
}

if (packet.type === "data") {
let buffer: Uint8Array;

if (configuration) {
const buffer = new Uint8Array(configuration.byteLength + packet.data.byteLength);
buffer.set(configuration);
buffer.set(packet.data, configuration.byteLength);
configuration = undefined;
} else {
buffer = packet.data;
}

console.log(buffer);
}
}

Then the data (maybe with configuration prepended) needs to be converted to a AVCSample structure.

import { annexBSplitNalu } from "@yume-chan/scrcpy";

function nalStreamToAvcSample(buffer: Uint8Array) {
const nalUnits: Uint8Array[] = [];
let totalLength = 0;

for (const unit of annexBSplitNalu(buffer)) {
nalUnits.push(unit);
totalLength += unit.byteLength + 4;
}

const sample = new Uint8Array(totalLength);
let offset = 0;
for (const nalu of nalUnits) {
sample[offset] = nalu.byteLength >> 24;
sample[offset + 1] = nalu.byteLength >> 16;
sample[offset + 2] = nalu.byteLength >> 8;
sample[offset + 3] = nalu.byteLength & 0xff;
sample.set(nalu, offset + 4);
offset += 4 + nalu.byteLength;
}
return sample;
}

Again, how to save the AVCSample into the video stream depends on the container format and muxer library.

H.265

H.265/HEVC is very similar to H.264. It's also a joint effect between ITU-T and ISO, and it also has two formats, Annex B (identical to H.264) and HEVC (similar to AVC format but with different fields).

Configuration

The H.265 configuration packet contains the Video Parameter Set (VPS), Sequence Parameter Set (SPS), and Picture Parameter Set (PPS).

Same as H.264, it also need to be converted and stored in video metadata, and prepended to the next frame data to be stored in the video stream. Except, the functions for doing that are different because the data is different.

First extract them from a packet:

import { h265SearchConfiguration } from "@yume-chan/scrcpy";

for await (const packet of videoPacketStream) {
if (packet.type === "configuration") {
const { videoParameterSet, sequenceParameterSet, pictureParameterSet } =
h265SearchConfiguration(packet.data);
console.log(videoParameterSet, sequenceParameterSet, pictureParameterSet);
}
}

Then convert to HEVCDecoderConfigurationRecord:

import type { H265NaluRaw } from "@yume-chan/scrcpy";
import { h265ParseSequenceParameterSet, h265ParseVideoParameterSet } from "@yume-chan/scrcpy";

function h265ConfigurationToHevcDecoderConfigurationRecord(
videoParameterSet: H265NaluRaw,
sequenceParameterSet: H265NaluRaw,
pictureParameterSet: H265NaluRaw,
) {
const {
profileTierLevel: {
generalProfileTier: {
profile_space: general_profile_space,
tier_flag: general_tier_flag,
profile_idc: general_profile_idc,
profileCompatibilitySet: generalProfileCompatibilitySet,
constraintSet: generalConstraintSet,
},
general_level_idc,
},
vps_max_layers_minus1,
vps_temporal_id_nesting_flag,
} = h265ParseVideoParameterSet(videoParameterSet.rbsp);

const {
chroma_format_idc,
bit_depth_luma_minus8,
bit_depth_chroma_minus8,
vuiParameters: { min_spatial_segmentation_idc = 0 } = {},
} = h265ParseSequenceParameterSet(sequenceParameterSet.rbsp);

const buffer = new Uint8Array(
23 +
5 * 3 +
videoParameterSet.data.length +
sequenceParameterSet.data.length +
pictureParameterSet.data.length,
);

/* unsigned int(8) configurationVersion = 1; */
buffer[0] = 1;

/*
* unsigned int(2) general_profile_space;
* unsigned int(1) general_tier_flag;
* unsigned int(5) general_profile_idc;
*/
buffer[1] = (general_profile_space << 6) | (Number(general_tier_flag) << 5) | general_profile_idc;

/* unsigned int(32) general_profile_compatibility_flags; */
buffer[2] = generalProfileCompatibilitySet[0]!;
buffer[3] = generalProfileCompatibilitySet[1]!;
buffer[4] = generalProfileCompatibilitySet[2]!;
buffer[5] = generalProfileCompatibilitySet[3]!;

/* unsigned int(48) general_constraint_indicator_flags; */
buffer[6] = generalConstraintSet[0]!;
buffer[7] = generalConstraintSet[1]!;
buffer[8] = generalConstraintSet[2]!;
buffer[9] = generalConstraintSet[3]!;
buffer[10] = generalConstraintSet[4]!;
buffer[11] = generalConstraintSet[5]!;

/* unsigned int(8) general_level_idc; */
buffer[12] = general_level_idc;

/*
* bit(4) reserved = '1111'b;
* unsigned int(12) min_spatial_segmentation_idc;
*/
buffer[13] = 0xf0 | (min_spatial_segmentation_idc >> 8);
buffer[14] = min_spatial_segmentation_idc;

/*
* bit(6) reserved = '111111'b;
* unsigned int(2) parallelismType;
*/
buffer[15] = 0xfc;

/*
* bit(6) reserved = '111111'b;
* unsigned int(2) chromaFormat;
*/
buffer[16] = 0xfc | chroma_format_idc;

/*
* bit(5) reserved = '11111'b;
* unsigned int(3) bitDepthLumaMinus8;
*/
buffer[17] = 0xf8 | bit_depth_luma_minus8;

/*
* bit(5) reserved = '11111'b;
* unsigned int(3) bitDepthChromaMinus8;
*/
buffer[18] = 0xf8 | bit_depth_chroma_minus8;

/* bit(16) avgFrameRate; */
buffer[19] = 0;
buffer[20] = 0;

/*
* bit(2) constantFrameRate;
* bit(3) numTemporalLayers;
* bit(1) temporalIdNested;
* unsigned int(2) lengthSizeMinusOne;
*/
buffer[21] = ((vps_max_layers_minus1 + 1) << 3) | (Number(vps_temporal_id_nesting_flag) << 2) | 3;

/* unsigned int(8) numOfArrays; */
buffer[22] = 3;

let i = 23;

for (const nalu of [videoParameterSet, sequenceParameterSet, pictureParameterSet]) {
/*
* bit(1) array_completeness;
* unsigned int(1) reserved = 0;
* unsigned int(6) NAL_unit_type;
*/
buffer[i] = nalu.nal_unit_type;
i += 1;

/* unsigned int(16) numNalus; */
buffer[i] = 0;
i += 1;
buffer[i] = 1;
i += 1;

/* unsigned int(16) nalUnitLength; */
buffer[i] = nalu.data.length >> 8;
i += 1;
buffer[i] = nalu.data.length;
i += 1;

buffer.set(nalu.data, i);
i += nalu.data.length;
}

return buffer;
}

Frames

Handling H.265 frame data is same as H.264.

AV1

AV1 is much simpler to handle.

Its configuration is in the first data packet, and usually it doesn't need to be stored in the metadata. So the configuration packets can be ignored.

The data packets only need to be saved as-is into the video stream.

Start recording anytime

The above examples assume you are saving the video stream from the beginning. Because a video file must start with a configuration and a keyframe, it's a little different if you want to start and stop recording in the middle.

Before v3.0

Before v3.0, it's not possible to manually request configuration and keyframe packets.

The best possible approach is caching the latest configuration, and frames starting from the latest keyframe, and use them when starting recoding.

import type {
ScrcpyMediaStreamPacket,
ScrcpyMediaStreamConfigurationPacket,
ScrcpyMediaStreamDataPacket,
} from "@yume-chan/scrcpy";

declare const videoPacketStream: ReadableStream<ScrcpyMediaStreamPacket>;

let configuration: ScrcpyMediaStreamConfigurationPacket | undefined;
const frames: ScrcpyMediaStreamDataPacket[] = [];

let recording = false;

for await (const packet of videoPacketStream) {
switch (packet.type) {
case "configuration":
configuration = packet;
frames.length = 0; // Clear the array

if (recording) {
// Convert and save the configuration
}
break;
case "data":
frames.length = 0;
frames.push(packet);

if (recording) {
// Convert and save the frame
}
break;
}
}

function startRecord() {
if (recording) {
return;
}

recording = true;

if (configuration) {
// Convert and save the configuration

for (const frame of frames) {
// Convert and save the frame
}
}
}

Obviously, this will record several extra frames before requested. To make it worse, because Android produces keyframes bases on fixed frame intervals, but new frames are only generated when the screen content changes, those extra frames may last several seconds, or even surpass the max keyframe time interval of some container formats.

v3.0 and later

v3.0 added the reset video control message. It requests the server to restart capturing and encoding immediately, which produces new configuration and keyframe packets, so the stream can be recorded from any point. See that page for examples.