Difference between revisions of "Gecko:Overview"

From MozillaWiki
Jump to: navigation, search
(Layout: - make a brief section (instead of just a standalone link) for pagination.)
(Pagination)
Line 537: Line 537:
 
==== Pagination ====
 
==== Pagination ====
  
The concepts behind pagination (also known as fragmentation) are a bit complicated, so we've split them off into a separate document: [[Gecko:Continuation_Model]].  This code is used for printing, print-preview, and multicolumn frames.
+
The concepts behind pagination (also known as fragmentation) are a bit complicated, so for now we've split them off into a separate document: [[Gecko:Continuation_Model]].  This code is used for printing, print-preview, and multicolumn frames.
  
 
=== Dynamic change handling along the rendering pipeline ===
 
=== Dynamic change handling along the rendering pipeline ===

Revision as of 02:35, 24 May 2013

This document attempts to give an overview of the different parts of Gecko, what they do, and why they do it, say where the code for them is within the repository, and link to more specific documentation (when available) covering the details of that code. It is not yet complete. Maintainers of these areas of code should correct errors, add information, and add links to more detailed documentation (since this document is intended to remain an overview, not complete documentation).

Browsers, Frames, and Document Navigation

Docshell and Session History

The user of a Web browser can change the page shown in that browser in many ways: by clicking a link, loading a new URL, using the forward and back buttons, or other ways. This can happen inside a space we'll call a browsing context; this space can be a browser window, a tab, or a frame or iframe within a document. The toplevel data structures within Gecko represent this browsing context; they contain other data structures representing the individual pages displayed inside of it (most importantly, the current one). In terms of implementation, these two types of navigation, the top level of a browser and the frames within it, largely use the same data structures.

In Gecko, the docshell is the toplevel object responsible for managing a single browsing context. It, and the associated session history code, manage the navigation between pages inside of a docshell. (Note the difference between session history, which is a sequence of pages in a single browser session, used for recording information for back and forward navigation, and global history, which is the history of pages visited and associated times, regardless of browser session, used for things like link coloring and address autocompletion.)

There are relatively few objects in Gecko that are associated with a docshell rather than being associated with a particular one of the pages inside of it. Most such objects are attached to the docshell. An important object associated with the docshell is the outer window object in the DOM code (where both the outer and inner window objects are implemented by nsGlobalWindow, though HTML5 describes the outer window as a WindowProxy and the inner window as a Window). See DOM for more information on this.

The most toplevel object for managing the contents of a particular page being displayed within a docshell is a document viewer (see layout). Other important objects associated with this presentation are the document (see DOM) and the pres(entation) shell and pres(entation) context (see layout).

Docshells are organized into a tree. If a docshell has a non-null parent, then it corresponds to a subframe in whatever page is currently loaded in the parent docshell. Only the root docshell of a docshell tree manages the session history (this does not match the conceptual model in the HTML5 spec and may be subject to change).

  • code: mozilla/docshell/
  • bugzilla: Core::Document Navigation
  • documentation: DocShell:Home Page

Embedding

To be written (and maybe rewritten if we get an IPC embedding API).

Multi-process and IPC

To be written.

Networking

The network library Gecko uses is called Necko. Necko APIs are largely organized around three concepts: URI objects, protocol handlers, and channels.

Protocol handlers

A protocol handler is an XPCOM service associated with a particular URI scheme or network protocol. Necko includes protocol handlers for HTTP, FTP, the data: URI scheme, and various others. Extensions can implement protocol handlers of their own.

A protocol handler implements the nsIProtocolHandler API, which serves three primary purposes:

  1. Providing metadata about the protocol (its security characteristics, whether it requires actual network access, what the corresponding URI scheme is, what TCP port the protocol uses by default).
  2. Creating URI objects for the protocol's scheme.
  3. Creating channel objects for the protocol's URI objects

Typically, the built-in I/O service (nsIIOService) is responsible for finding the right protocol handler for URI object creation and channel creation, while a variety of consumers queries protocol metadata. Querying protocol metadata is the recommended way to handle any sort of code that needs to have different behavior for different URI schemes. In particular, unlike whitelists or blacklists, it correctly handles the addition of new protocols.

A service can register itself as a protocol handler by registering for the contract ID "@mozilla.org/network/protocol;1?name=SSSSS" where SSSS is the URI scheme for the protocol (e.g. "http", "ftp", and so forth).

URI objects

URI objects, which implement the nsIURI API, are a way of representing URIs and IRIs. Their main advantage over strings is that they do basic syntax checking and canonicalization on the URI string that they're provided with. They also provide various accessors to extract particular parts of the URI and provide URI equality comparisons. URIs that correspond to hierarchical schemes implement the additional nsIURL interface, which exposes even more accessors for breaking out parts of the URI.

URI objects are typically created by calling the newURI method on the I/O service, or in C++ by calling the NS_NewURI utility function. This makes sure to create the URI object using the right protocol handler, which ensures that the right kind of object is created. Direct creation of URIs via createInstance is reserved for protocol handler implementations.

Channels

Channels are the Necko representation of a single request/response interaction with a server. A channel is created by calling the newChannel method on the I/O service, or in C++ by calling the NS_NewChannel utility function. The channel can then be configured as needed, and finally its asyncOpen method can be called. This method takes an nsIStreamListener as an argument.

If asyncOpen has returned successfully, the channel guarantees that it will asynchronously call the onStartRequest and onStopRequest methods on its stream listener. This will happen even if there are network errors that prevent Necko from actually performing the requests. Such errors will be reported in the channel's status and in the status argument to onStopRequest.

If the channel ends up being able to provide data, it will make one or more onDataAvailable on its listener after calling onStartRequest and before calling onStopRequest. For each call, the listener is responsible for either returning an error or reading the entire data stream passed in to the call.

If an error is returned from either onStartRequest or onDataAvailable, the channel must act as if it has been canceled with the corresponding error code.

A channel has two URI objects associated with it. The originalURI of the channel is the URI that was originally passed to newChannel to create the channel that then had asyncOpen called on it. The URI is the URI from which the channel is reading data. These can be different in various cases involving protocol handlers that forward network access to other protocol handlers, as well as in situations in which a redirect occurs (e.g. following an HTTP 3xx response). In redirect situations, a new channel object will be created, but the originalURI will be propagated from the old channel the new channel.

Note that the nsIRequest that's passed to onStartRequest must match the one passed to onDataAvailable and onStopRequest, but need not be the original channel that asyncOpen was called on. In particular, when an HTTP redirect happens the request argument to the callbacks will be the post-redirect channel.

TODO: crypto?

Document rendering pipeline

Some of the major components of Gecko can be described as steps on the path from an HTML document coming in from the network to the graphics commands needed to render that document. An HTML document is a serialization of a tree structure. (FIXME: add diagram) The HTML parser and content sink create an in-memory representation of this tree, which we call the DOM tree or content tree. Many JavaScript APIs operate on the content tree. Then, in layout, we create a second tree, the frame tree (or rendering tree) that is a similar shape to the content tree, but where each node in the tree represents a rectangle (except in SVG where they represent other shapes). We then compute the positions of the nodes in the frame tree (called frames) and paint them using our cross-platform graphics APIs (which, underneath, map to platform-specific graphics APIs).

Parser

The parser's job is to transform a character stream into a tree structure, with the help of the content sink classes.

HTML is parsed using a parser implementing the parsing algorithm in the HTML specification (starting with HTML5). Much of this parser is translated from Java, and changes are made to the Java version. This parser in parser/html/.

The codebase still has the previous generation HTML parser, which is still used for a small number of things, though we hope to be able to remove it entirely soon. This parser is in parser/htmlparser/.

XML is parsed using the expat library (parser/expat/) and code that wraps it (parser/xml/). This is a non-validating parser; however, it loads certain DTDs to support XUL localization.

DOM / Content

The content tree or DOM tree is the central data structure for Web pages. It is a tree structure, initially created from the tree structure expressed in the HTML or XML markup. The nodes in the tree implement major parts of the DOM (Document Object Model) specifications. The nodes themselves are part of a class hierarchy rooted at nsINode; different derived classes are used for things such as text nodes, the document itself, HTML elements, SVG elements, etc., with further subclasses of many of these types (e.g., for specific HTML elements). Many of the APIs available to script running in Web pages are associated with these nodes. The tree structure persists while the Web pages is displayed, since it stores much of state associated with the Web page. The code for these nodes lives in the content/ directory.

The DOM APIs are not threadsafe. DOM nodes can be accessed only from the main thread (also known as the UI thread (user interface thread)) of the application.

There are also many other APIs available to Web pages that are not APIs on the nodes in the DOM tree. Many of these other APIs also live in the same directories, though some live in content/ and some in dom/. These include APIs such as the DOM event model.

The dom/ directory also includes some of the code needed to expose Web APIs to JavaScript (in other words, the glue code between JavaScript and these APIs). See Scripting below for details of this code.

TODO: Internal APIs vs. DOM APIs.

TODO: Mutation observers / document observers.

TODO: Reference counting and cycle collection.

TODO: specification links

Style System

In order to display the content, Gecko needs to compute the styles relevant to each DOM node. It does this based on the model described in the CSS specifications: this model applies to style specified in CSS (e.g. by a 'style' element, an 'xml-stylesheet' processing instruction or a 'style' attribute), style specified by presentation attributes, and the default style specified by our own user agent style sheets. There are two major sets of data structures within the style system:

  • first, data structures that represent sources of style data, such as CSS style sheets or data from stylistic HTML attributes
  • second, data structures that represent computed style for a given DOM node.

These sets of data structures are mostly distinct (for example, they store values in different ways).

The loading of CSS style sheets from the network is managed by the CSS loader; they are then tokenized by the CSS scanner and parsed by the CSS parser. Those that are attached to the document also expose APIs to script that are known as the CSS Object Model, or CSSOM.

The style sheets that apply to a document are managed by a class called the style set. The style set interacts with the different types of style sheets (representing CSS style sheets, presentational attributes, and 'style' attributes) through two interfaces: nsIStyleSheet for basic management of style sheets and nsIStyleRuleProcessor for getting the style data out of them. Usually the same object implements both interfaces, except in the most important case, CSS style sheets, where there is a single rule processor for all of the CSS style sheets in each origin (user/UA/author) of the CSS cascade.

The computed style data for an element/frame are exposed to the rest of Gecko through a class called nsStyleContext. Rather than having a member variable for each CSS property, it breaks up the properties into groups of related properties called style structs. These style structs obey the rule that all of the properties in a single struct either inherit by default (what the CSS specifications call "Inherited: yes" in the definition of properties; we call these inherited structs) or all are not inherited by default (we call these reset structs). Separating the properties in this way improves the ability to share the structs between similar style contexts and reduce the amount of memory needed to store the style data. The nsStyleContext API exposes a method for getting each struct, so you'll see code like sc->GetStyleText()->mTextAlign for getting the value of the text-align CSS property. (Frames (see the Layout section below) also have the same GetStyle* methods, which just forward the call to the frame's style context.)

The style contexts form a tree structure, in a shape somewhat like the content tree (except that we coalesce identical sibling style contexts rather than keeping two of them around; if the parents have been coalesced then this can apply recursively and coalasce cousins, etc.; we do not coalesce parent/child style contexts). The parent of a style context has the style data that the style context inherits from when CSS inheritance occurs. This means that the parent of the style context for a DOM element is generally the style context for that DOM element's parent, since that's how CSS says inheritance works.

The process of turning the style sheets into computed style data goes through three main steps, the first two of which closely relate to the nsIStyleRule interface, which represents an immutable source of style data, conceptually representing (and for CSS style rules, directly storing) a set of property:value pairs. (It is similar to the idea of a CSS style rule, except that it is immutable; this immutability allows for significant optimization. When a CSS style rule is changed through script, we create a new style rule.)

The first step of going from style sheets to computed style data is finding the ordered sequence of style rules that apply to an element. The order represents which rules override which other rules: if two rules have a value for the same property, the higher ranking one wins. (Note that there's another difference from CSS style rules: declarations with !important are represented using a separate style rule.) This is done by calling one of the nsIStyleRuleProcessor::RulesMatching methods. The ordered sequence is stored in a trie called the rule tree: the path from the root of the rule tree to any (leaf or non-leaf) node in the rule tree represents a sequence of rules, with the highest ranking farthest from the root. Each rule node (except for the root) has a pointer to a rule, but since a rule may appear in many sequences, there are sometimes many rule nodes pointing to the same rule. Once we have this list we create a style context (or find an appropriate existing sibling) with the correct parent pointer (for inheritance) and rule node pointer (for the list of rules), and a few other pieces of information (like the pseudo-element).

The second step of going from style sheets to computed style data is getting the winning property:value pairs from the rules. (This only provides property:value pairs for some of the properties; the remaining properties will fall back to inheritance or to their initial values depending on whether the property is inherited by default.) We do this step (and the third) for each style struct, the first time it is needed. This is done in nsRuleNode::WalkRuleTree, where we ask each style rule to fill in its property:value pairs by calling its MapRuleInfoInto function. When called, the rule fills in only those pairs that haven't been filled in already, since we're calling from the highest priority rule to the lowest (since in many cases this allows us to stop before going through the whole list, or to do partial computation that just adds to data cached higher in the rule tree).

The third step of going from style sheets to computed style data (which various caching optimizations allow us to skip in many cases) is actually doing the computation; this generally means we transform the style data into the data type described in the "Computed Value" line in the property's definition in the CSS specifications. This transformation happens in functions called nsRuleNode::Compute*Data, where the * in the middle represents the name of the style struct. This is where the transformation from the style sheet value storage format to the computed value storage format happens.

Once we have the computed style data, we then store it: if a style struct in the computed style data doesn't depend on inherited values or on data from other style structs, then we can cache it in the rule tree (and then reuse it, without recomputing it, for any style contexts pointing to that rule node). Otherwise, we store it on the style context (in which case it may be shared with the style context's descendant style contexts). This is where keeping inherited and non-inherited properties separate is useful: in the common case of relatively few properties being specified, we can generally cache the non-inherited structs in the rule tree, and we can generally share the inherited structs up and down the style context tree.

The ownership models in style sheet structures are a mix of reference counted structures (for things accessible from script) and directly owned structures. Style contexts are reference counted, and own their parents (from which they inherit), and rule nodes are garbage collected with a simple mark and sweep collector (which often never needs to run).

Layout

Much of the layout code deals with operations on the frame tree (or rendering tree). In the frame tree, each node represents a rectangle (or, for SVG, other shapes). The frame tree has a shape similar to the content tree, since many content nodes have one corresponding frame, though it differs in a few ways, since some content nodes have more than one frame or don't have any frames at all. When elements are display:none in CSS or undisplayed for certain other reasons, they won't have any frames. When elements are broken across lines or pages, they have multiple frames; elements may also have multiple frames when multiple frames nested inside each other are needed to display a single element (for example, a table, a table cell, or many types of form controls).

Each node in the frame tree is an instance of a class derived from nsIFrame. As with the content tree, there is a substantial type hierarchy, but the type hierarchy is very different: it includes types like text frames, blocks and inlines, the various parts of tables, and the various types of HTML form controls.

Frames are allocated within an arena owned by the pres shell. Each frame is owned by its parent; frames are not reference counted, and code must not hold on to pointers to frames. To mitigate potential security bugs when pointers to destroyed frames, we use frame poisoning, which takes two parts. When a frame is destroyed other than at the end of life of the presentation, we fill its memory with a pattern consisting of a repeated pointer to inaccessible memory, and then put the memory on a per-frame-class freelist. This means that if code accesses the memory through a dangling pointer, it will either crash quickly by dereferencing the poison pattern or it will find a valid frame.

Like the content tree, frames must be accessed only from the UI thread.

The frame tree should not store any important data. While it does usually persist while a page is being displayed, frames are often destroyed and recreated in response to certain style changes, and in the future we may do the same to reduce memory use for pages that are currently inactive. There were a number of cases where this rule was violated in the past and we stored important data in the frame tree; however, most (though not quite all) such cases are now fixed.

The rectangle represented by the frame is what CSS calls the element's border box. This is the outside edge of the border (or the inside edge of the margin). The margin lives outside the border; and the padding lives inside the border. In addition to nsIFrame::GetRect, we also have the APIs nsIFrame::GetPaddingRect to get the padding box (the outside edge of the padding, or inside edge of the border) and nsIFrame::GetContentRect to get the content box (the outside edge of the content, or inside edge of the padding). These APIs may produce out of date results when reflow is needed (or has not yet occurred).

In addition to tracking a rectangle, frames also track two overflow areas: visual overflow and scrollable overflow. These overflow areas represent the union of the area needed by the frame and by all its descendants. The visual overflow is used for painting-related optimizations: it is a rectangle covering all of the area that might be painted when the frame and all of its descendants paint. The scrollable overflow represents the area that the user should be able to scroll to to see the frame and all of its descendants. In some cases differences between the frame's rect and its overflow happen because of descendants that stick out of the frame; in other cases they occur because of some characteristic of the frame itself. The two overflow areas are similar, but there are differences: for example, margins are part of scrollable overflow but not visual overflow, whereas text-shadows are part of visual overflow but not scrollable overflow.

When frames are broken across lines, columns, or pages, we create multiple frames representing the multiple rectangles of the element. The first one is the primary frame, and the rest are its continuations (which are more likely to be destroyed and recreated during reflow). These frames are linked together as continuations: they have a doubly-linked list that can be used to traverse the continuations using nsIFrame::GetPrevContinuation and nsIFrame::GetNextContinuation. (Currently continuations always have the same style data, though we may at some point want to break that invariant.)

Continuations are sometimes siblings of each other, and sometimes not. For example, if a paragraph contains a span which contains a link, and the link is split across lines, then the continuations of the span are siblings (since they are both children of the paragraph), but the continuations of the link are not siblings (since each continuation of the link is descended from a different continuation of the span). Traversing the entire frame tree does not require considering continuations, since all of the continuations are descendants of the element containing the break.

We also use continuations for cases (most importantly, bidi reordering, where left-to-right text and right-to-left text need to be separated into different continuations since they may not form a contiguous rectangle) where the continuations should not be rewrapped during reflow: we call these continuations fixed rather than fluid. nsIFrame::GetNextInFlow and nsIFrame::GetPrevInFlow traverse only the fluid continuations and do not cross fixed continuation boundaries.

TODO: nsBox craziness from https://bugzilla.mozilla.org/show_bug.cgi?id=524925#c64

TODO: describe block-within-inline splits

TODO: link to documentation of block and inline layout

TODO: link to documentation of scrollframes

TODO: link to documentation of XUL frame classes

Code (note that most files in base and generic have useful one line descriptions at the top that show up in MXR):

  • layout/base/ contains objects that coordinate everything and a bunch of other miscellaneous things
  • layout/generic/ contains the basic frame classes as well as support code for their reflow methods (nsHTMLReflowState, nsHTMLReflowMetrics)
  • layout/forms/ contains frame classes for HTML form controls
  • layout/tables/ contains frame classes for CSS/HTML tables
  • layout/mathml/ contains frame classes for MathML
  • layout/svg/ contains frame classes for SVG
  • layout/xul/ contains frame classes for the XUL box model and for various XUL widgets

Bugzilla:

  • All of the components whose names begin with "Layout" in the "Core" product

Frame Construction

Frame construction is the process of creating frames. This is done when styles change in ways that require frames to be created or recreated or when nodes are inserted into the document. The content tree and the frame tree don't have quite the same shape, and the frame construction process does some of the work of creating the right shape for the frame tree. It handles the aspects of creating the right shape that don't depend on layout information. So for example, frame construction handles the work needed to implement table anonymous objects but does not handle frames that need to be created when an element is broken across lines or pages.

The basic unit of frame construction is a run of contiguous children of a single parent element. When asked to construct frames for such a run of children, the frame constructor first determines, based on the siblings and parent of the nodes involved, where in the frame tree the new frames should be inserted. Then the frame constructor walks through the list of content nodes involved and for each one creates a temporary data structure called a frame construction item. The frame construction item encapsulates various information needed to create the frames for the content node: its style data, some metadata about how one would create a frame for this node based on its namespace, tag name, and styles, and some data about what sort of frame will be created. This list of frame construction items is then analyzed to see whether constructing frames based on it and inserting them at the chosen insertion point will produce a valid frame tree. If it will not, the frame constructor either fixes up the list of frame construction items so that the resulting frame tree would be valid or throws away the list of frame construction items and requests the destruction and re-creation of the frame for the parent element so that it has a chance to create a list of frame construction items that it can fix up.

Once the frame constructor has a list of frame construction items an an insertion point that would lead to a valid frame tree, it goes ahead and creates frames based on those items. Creation of a non-leaf frame recursively attempts to create frames for the children of that frame's element, so in effect frames are created in a depth-first traversal of the content tree.

The vast majority of the code in the frame constructor, therefore, falls into one of these categories:

  • Code to determined the correct insertion point in the frame tree for new frames.
  • Code to create, for a given content node, frame construction items. This involves some searches through static data tables for metadata about the frame to be created.
  • Code to analyze the list of frame construction items.
  • Code to fix up the list of frame construction items.
  • Code to create frames from frame construction items.

Code: layout/base/nsCSSFrameConstructor.h and layout/base/nsCSSFrameConstructor.cpp

Reflow

Reflow is the process of computing the positions and sizes of frames. (After all, frames represent rectangles, and at some point we need to figure out exactly *what* rectangle.) Reflow is done recursively, with each frame's Reflow method calling the Reflow methods on that frame's descendants.

In many cases, the correct results are defined by CSS specifications (particularly CSS 2.1). In some cases, the details are not defined by CSS, though in some (but not all) of those cases we are constrained by Web compatibility. When the details are defined by CSS, however, the code to compute the layout is generally structured somewhat differently from the way it is described in the CSS specifications, since the CSS specifications are generally written in terms of constraints, whereas our layout code consists of algorithms optimized for incremental recomputation.

The reflow generally starts from the root of the frame tree, though some other types of frame can act as reflow roots and start a reflow from them. Reflow roots must obey the invariant that a change inside one of their descendants never changes their rect or overflow areas (though currently scrollbars are reflow roots but don't quite obey this invariant).

In many cases, we want to reflow a part of the frame tree, and we want this reflow to be efficient. For example, when content is added or removed from the document tree or when styles change, we want the amount of work we need to redo to be proportional to the amount of content. We also want to efficiently handle a series of changes to the same content.

To do this, we maintain two bits on frames: NS_FRAME_IS_DIRTY indicates that a frame and all of its descendants require reflow. NS_FRAME_HAS_DIRTY_CHILDREN indicates that a frame has a descendant that is dirty or has had a descendant removed (i.e., that it has a child that has NS_FRAME_IS_DIRTY or NS_FRAME_HAS_DIRTY_CHILDREN or it had a child removed). These bits allow coalescing of multiple updates; this coalescing is done in nsPresShell, which tracks the set of reflow roots that require reflow. The bits are set during calls to nsPresShell::FrameNeedsReflow and are cleared during reflow.

The layout algorithms used by many of the frame classes are those specified in CSS, which are based on the traditional document formatting model, where widths are input and heights are output.

In some cases, however, widths need to be determined based on the content. This depends on two intrinsic widths: the minimum intrinsic width (see nsIFrame::GetMinWidth) and the preferred intrinsic width (see nsIFrame::GetPrefWidth). The concept of what these widths represent is best explained by describing what they are on a paragraph containing only text: in such a paragraph the minimum intrinsic width is the width of the longest word, and the preferred intrinsic width is the width of the entire paragraph laid out on one line.

Intrinsic widths are invalidated separately from the dirty bits described above. When a caller informs the pres shell that a frame needs reflow (nsIPresShell::FrameNeedsReflow), it passes one of three options:

  • eResize indicates that no intrinsic widths are dirty
  • eTreeChange indicates that intrinsic widths on it and its ancestors are dirty (which happens, for example, if new children are added to it)
  • eStyleChange indicates that intrinsic widths on it, its ancestors, and its descendants are dirty (for example, if the font-size changes)

Reflow is the area where the XUL frame classes (those that inherit from nsBoxFrame or nsLeafBoxFrame) are most different from the rest. Instead of using nsIFrame::Reflow, they do their layout computations using intrinsic size methods called GetMinSize, GetPrefSize, and GetMaxSize (which report intrinsic sizes in two dimensions) and a final layout method called Layout. In many cases these methods defer some of the computation to a separate object called a layout manager.

When an individual frame's Reflow method is called, most of the input is provided on an object called nsHTMLReflowState and the output is filled in to an object called nsHTMLReflowMetrics. After reflow, the caller (usually the parent) is responsible for setting the frame's size based on the metrics reported. (This can make some computations during reflow difficult, since the new size is found in either the reflow state or the metrics, but the frame's size is still the old size. However, it's useful for invalidating the correct areas that need to be repainted.)

One major difference worth noting is that in XUL layout, the size of the child is set prior to its parent calling its Layout method. (Once invalidation uses display lists and is no longer tangled up in Reflow, it may be worth switching non-XUL layout to work this way as well.)

Painting

TODO: display lists (and event handling)

TODO: layers

Pagination

The concepts behind pagination (also known as fragmentation) are a bit complicated, so for now we've split them off into a separate document: Gecko:Continuation_Model. This code is used for printing, print-preview, and multicolumn frames.

Dynamic change handling along the rendering pipeline

The ability to make changes to the DOM from script is a major feature of the Web platform. Web authors rely on the concept (though there are a few exceptions, such as animations) that changing the DOM from script leads to the same rendering that would have resulted from starting from that DOM tree. They also rely on the performance characteristics of these changes: small changes to the DOM that have small effects should have proportionally small processing time. This means that Gecko needs to efficiently propagate changes from the content tree to style, the frame tree, the geometry of the frame tree, and the screen.

For many types of changes, however, there is substantial overhead to processing a change, no matter how small. For example, reflow must propagate from the top of the frame tree down to the frames that are dirty, no matter how small the change. One very common way around this is to batch up changes. We batch up changes in lots of ways, for example:

  • The content sink adds multiple nodes to the DOM tree before notifying listeners that they've been added. This allows notifying once about an ancestor rather than for each of its descendants, or notifying about a group of descendants all at once, which speeds up the processing of those notifications.
  • We batch up nodes that require style reresolution (recomputation of selector matching and processing the resulting style changes). This batching is tree based, so it not only merges multiple notifications on the same element, but also merges a notification on an ancestor with a notification on its descendant (since some of these notifications imply that style reresolution is required on all descendants).
  • We wait to reconstruct frames that require reconstruction (after destroying frames eagerly). This, like the tree-based style reresolution batching, avoids duplication both for same-element notifications and ancestor-descendant notifications, even though it doesn't actually do any tree-based caching.
  • We postpone doing reflows until needed. As for style reresolution, this maintains tree-based dirty bits (see the description of NS_FRAME_IS_DIRTY and NS_FRAME_HAS_DIRTY_CHILDREN under Reflow).
  • We allow the OS to queue up multiple invalidates before repainting (though we will likely switch to controlling that ourselves). This leads to a single repaint of some set of pixels where there might otherwise have been multiple (though it may also lead to more pixels being repainted if multiple rectangles are merged to a single one).

Having changes buffered up means, however, that various pieces of information (layout, style, etc.) may not be up-to-date. Some things require up-to-date information: for example, we don't want to expose the details of our buffering to Web page script since the programming model of Web page script assumes that DOM changes take effect "immediately", i.e., that the script shouldn't be able to detect any buffering. Many Web pages depend on this.

We therefore have ways to flush these different sorts of buffers. There are methods called FlushPendingNotifications on nsIDocument and nsIPresShell, that take an argument of what things to flush:

  • Flush_Content: create all the content nodes from data buffered in the parser
  • Flush_ContentAndNotify: the above, plus notify document observers about the creation of all nodes created so far
  • Flush_Style: the above, plus make sure style data are up-to-date
  • Flush_Frames: the above, plus make sure all frame construction has happened (currently the same as Flush_Style)
  • Flush_InterruptibleLayout: the above, plus perform layout (Reflow), but allow interrupting layout if it takes too long
  • Flush_Layout: the above, plus ensure layout (Reflow) runs to completion
  • Flush_Display (should never be used): the above, plus ensure repainting happens

The major way that notifications of changes propagate from the content code to layout and other areas of code is through the nsIDocumentObserver and nsIMutationObserver interfaces. Classes can implement this interface to listen to notifications of changes for an entire document or for a subtree of the content tree.

WRITE ME: ... layout document observer implementations

TODO: how style system optimizes away rerunning selector matching

TODO: style changes and nsChangeHint

Refresh driver

Graphics

2D Graphics API

  • main use is at the end of the document pipeline, so could be part of it
  • also used more directly from canvas
Gfx layers class diagram (before refactoring)

This diagram shows most of the the inheritance and the references between classes involved in the the gfx layers code (as of april 2012). Each time there is a '*' in a class name it means that this class exists in four different declinations (Image, Thebes, Canvas and Color). Classes finishing by "OGL" also have their "D3D10" and "D3D9" counterparts.

ShadowableXXX classes are representations of the layers in content thread/process, whereas ShadowXXX classes are their equivalent in the compositor thread/process. Shadow and Shadowable objects are kept in sync using IPDL protocols.

TODO: much more goes here

Blog posts with information on Layers that should be integrated here:

New layers architecture, with OGL backend

The layers system is being refactored introducing better abstractions in order to avoid code duplication between backends. This is a work in progress that lives in the graphics branch and has not landed in mozilla-central yet.

Among the important changes:

  • Texture transfer are now handled by Compositables and Textures that have their own IPDL protocols.
  • A Texture (see TextureClient and TextureHost) is a thin abstraction over texture memory. It abstracts out the specifics of the platform.
  • A Compositable manages one or more Textures and the extra information to composite them correctly, such as texture coordinates. Compositable classes are mostly platform independent.
  • A compositable can be attached to a layer so that the layer can use it on the compositor side.
  • While layers are only modified in layers transaction originating from the main thread, a compositable transaction can happen from the main thread (as a subset of a layers transaction) or from any other thread using ImageBridge. This is how we do async-video and can be used when we implement WebGL/canvas2d workers.
  • Async textures updates (ImageBridge) and in-layers-transaction texture updates use the same code.
  • Since compositables and textures abstract out platform specificities, ShadowLayers are backend-independent (no more specific layer classes for OpenGL backend or D3D10)

Scripting

  • JavaScript Engine
  • XPConnect
  • quickstubs
  • security (caps, wrappers)
  • DOM glue, classinfo, etc.

Images

Plugins

Platform-specific layers

  • widget
  • native theme
  • files, networking, other low-level things
  • Accessibility APIs

Editor

Base layers

NSPR

NSPR is a library for providing cross-platform APIs for various platform-specific functions. We tend to be trying to use it as little as possible, although there are a number of areas (particularly some network-related APIs and threading/locking primitives) where we use it quite a bit.

XPCOM

XPCOM is a cross-platform modularity library, modeled on Microsoft COM. It is an object system in which all objects inherit from the nsISupports interface.

components and services, contract IDs and CIDs

prior overuse of XPCOM; littering with XPCOM does not produce modularity

Base headers (part of xpcom/base/) and data structures. See also mfbt.

Threading

xptcall, proxies

reference counting, cycle collection

String

XPCOM has string classes for representing sequences of characters. We have two parallel sets of classes, one for strings with 1-byte units (char, which may be signed or unsigned), and one for strings with 2-byte units (PRUnichar, always unsigned). The classes are named such that the class for 2-byte characters ends with String and the corresponding class for 1-byte characters ends with CString. 2-byte strings are almost always used to encode UTF-16. 1-byte strings are usually used to encode either ASCII or UTF-8, but are sometimes also used to hold data in some other encoding or just byte sequences.

The string classes distinguish, as part of the type hierarchy, between strings that must have a null-terminator at the end of their buffer (ns[C]String) and strings that are not required to have a null-terminator (ns[C]Substring). ns[C]Substring is the base of the string classes (since it imposes fewer requirements) and ns[C]String is a class derived from it. Functions taking strings as parameters should generally take one of these four types.

In order to avoid unnecessary copying of string data (which can have significant performance cost), the string classes support different ownership models. All string classes support the following three ownership models dynamically:

  • reference counted, copy-on-write, buffers (the default)
  • adopted buffers (a buffer that the string class owns, but is not reference counted, because it came from somewhere else)
  • dependent buffers, that is, an underlying buffer that the string class does not own, but that the caller that constructed the string guarantees will outlive the string instance

In addition, there is a special string class, nsAuto[C]String, that additionally contains an internal 64-unit buffer (intended primarily for use on the stack), leading to a fourth ownership model:

  • storage within an auto string's stack buffer

Auto strings will prefer reference counting an existing reference-counted buffer over their stack buffer, but will otherwise use their stack buffer for anything that will fit in it.

There are a number of additional string classes, particularly nsDependent[C]String, nsDependent[C]Substring, and the NS_LITERAL_[C]STRING macros which construct an nsLiteral[C]String which exist primarily as constructors for the other types. These types are really just convenient notation for constructing an ns[C]S[ubs]tring with a non-default ownership mode; they should not be thought of as different types. (The Substring, StringHead, and StringTail functions are also constructors for dependent [sub]strings.) Non-default ownership modes can also be set up using the Rebind and Adopt methods, although the Rebind methods actually live on the derived types, which is probably a mistake (although moving them up would require some care to avoid making an API that easily allows assigning a non-null-terminated buffer to a string whose static type indicates that it is null-terminated).

TODO: buffer growth, concatenation optimizations

TODO: encoding conversion, what's validated and what isn't

TODO: "string API", nsAString (historical)

  • Code: xpcom/string/
  • Bugzilla: Core::String