# DocEngine The synchronization algorithm for yjs docs. ``` ┌─────────┐ ┌───────────┐ ┌────────┐ │ Storage ◄──┤ DocEngine ├──► Server │ └─────────┘ └───────────┘ └────────┘ ``` # Core Components ## DocStorage ```ts export interface DocStorage { eventBus: DocEventBus; doc: ByteKV; syncMetadata: ByteKV; serverClock: ByteKV; } ``` Represents the local storage used, Specific implementations are replaceable, such as `IndexedDBDocStorage` on the `browser` and `SqliteDocStorage` on the `desktop`. ### DocEventBus Each `DocStorage` contains a `DocEventBus`, which is used to communicate with other engines that share the same storage. With `DocEventBus` we can sync updates between engines without connecting to the server. For example, on the `browser`, we have multiple tabs, all tabs share the same `IndexedDBDocStorage`, so we use `BroadcastChannel` to implement `DocEventBus`, which allows us to broadcast events to all tabs. On the `desktop` app, if we have multiple Windows sharing the same `SqliteDocStorage`, we must build a mechanism to broadcast events between all Windows (currently not implemented). ## DocServer ```ts export interface DocServer { pullDoc( docId: string, stateVector: Uint8Array ): Promise<{ data: Uint8Array; serverClock: number; stateVector?: Uint8Array; } | null>; pushDoc(docId: string, data: Uint8Array): Promise<{ serverClock: number }>; subscribeAllDocs(cb: (updates: { docId: string; data: Uint8Array; serverClock: number }) => void): Promise<() => void>; loadServerClock(after: number): Promise>; waitForConnectingServer(signal: AbortSignal): Promise; disconnectServer(): void; onInterrupted(cb: (reason: string) => void): void; } ``` Represents the server we want to synchronize, there is a simulated implementation in `tests/sync.spec.ts`, and the real implementation is in `packages/backend/server`. ### ServerClock `ServerClock` is a clock generated after each updates is stored in the Server. It is used to determine the order in which updates are stored in the Server. The `DocEngine` decides whether to pull updates from the server based on the `ServerClock`. The `ServerClock` written later must be **greater** than all previously. So on the client side, we can use `loadServerClock(the largest ServerClock previously received)` to obtain all changed `ServerClock`. ## DocEngine The `DocEngine` is where all the synchronization logic actually happens. Due to the complexity of the implementation, we divide it into 2 parts. ## DocEngine - LocalPart Synchronizing **the `YDoc` instance** and **storage**. The typical workflow is: 1. load data from storage, apply to `YDoc` instance. 2. track `YDoc` changes 3. write the changes back to storage. ### SeqNum There is a `SeqNum` on each Doc data in `Storage`. Every time `LocalPart` writes data, `SeqNum` will be +1. There is also a `PushedSeqNum`, which is used for RemotePart later. ## DocEngine - RemotePart Synchronizing `Storage` and `Server`. The typical workflow is: 1. Connect with the server, Load `ServerClocks` for all docs, Start subscribing to server-side updates. 2. Check whether each doc requires `push` and `pull` 3. Execute all push and pull 4. Listen for updates from `LocalPart` and push the updates to the server 5. Listen for server-side updates and write them to storage. ### PushedSeqNum Each Doc will record a `PushedSeqNum`, used to determine whether the doc has unpush updates. After each `push` is completed, `PushedSeqNum` + 1 If `PushedSeqNum` and `SeqNum` are still different after we complete the push (usually means the previous `push` failed) Then do a full pull and push and set `pushedSeqNum` = `SeqNum` ### PulledServerClock Each Doc also record `PulledServerClock`, Used to compare with ServerClock to determine whether to `pull` doc. When the `pull` is completed, set `PulledServerClock` = `ServerClock` returned by the server. ### Retry The `RemotePart` may fail at any time, and `RemotePart`'s built-in retry mechanism will restart the process in 5 seconds after failure.