How to Make A Game Universe with Thousands Fully Detailed 3D Planets
Procedurally generated environment in Unreal Engine

How to Make A Game Universe with Thousands Fully Detailed 3D Planets

There has been some discussion about Starfield game, and many gamers and developers are disappointed since there are so few planets, and they are not even complete 3D spherical planets. Some gamers point out that there exist games with hundreds of planets. That is true, but usually they are procedurally generated, simplistic, and eventually boring, because they recycle elements from common asset library.

Having seen dozens of videos about Star Citizen, I can only conclude that all planets seem to be barren or monotonic and eventually uninteresting. Planets in Star Citizen may have one structure here and there, perhaps some forest, but otherwise they seem to be just rocks. No Man Sky is more colourful and has more elements on many planets, but at least to me those planets lack that depth which a living, complex ecosystem should have, and are quite repetitive. They may suit a resource grinding game, where quantity seems to be more important that quality or depth.

None of those procedural universes have same level of detail and variance as our real world. Why? Some gamers and developers think that it is hard to have smooth transitions from space to 3D spherical planets, seamless landing from space. That it is not hard, since it needs just visual tricks and streaming cartographic data from disk, increasing the visual detail level gradually. Real problem is the amount of data needed to handle such universe, data which must be stored somewhere and processed with CPU, RAM, busses, and GPU.

 

The Case of Hypothetical Game Universe

Creation of a believable game universe is extremely complex task, involving multiple layers of elements that are usually interconnected many ways, just like in our real world. Of course, there is lots of variation in both scope and detail. There are games which run in a micro-universe, with limited number of locations, each of which have limited physical dimensions. Then there are so-called open world games, with very large, or apparently unlimited environments.

Fully detailed planets need lots of data, even at very simple and rough detail level. The amount of data is so large that it needs complex database schema with careful indexing to store and handle it. Then you need complex data structures to use that data in game. Visual outlook assets or narrative properties can be anything, but they will always need storage space too. It does not matter whether you create 3D environments manually or procedurally, they will always need lot of storage space for all necessary data, and they will use RAM, CPU and GPU to load those 3D environments into playable form.

Another limiting factor is time. Generation of content always takes time, be it manually prepared or procedurally generated. Manual generation is obviously time consuming, but procedural generation is based on algorithms, and each generative step will consume CPU cycles. More complex the generation, more options there is, longer it will take to generate things procedurally. Although generation of one procedural element may take only a microsecond, when we have millions of procedural elements, we are talking about seconds, minutes, or even hours of total execution time.

So, the question is not whether it is possible or not to generate an open world universe with thousands of detailed 3D explorable planets on PC or console as a single player game -  the real question is whether it is possible to handle the storage volume of necessary game data in a feasible amount of time. I am looking this only from technical point of view, so I leave out the questions like “do we really need such game universe”, “does it make the game any better”, or “is it economically reasonable project”, because they are questions of opinion. I am interested in facts that I can calculate.

To understand this problem, we must go down to basic level of software and system development and analyse whole workflow of generating game worlds, planets, and universe, focusing on the structure and volume of data needed to outline its elements, data storage strategies, and their physical limitations, what options we have, what choices we can make, and what are their consequences.

I must warn that many details presented are very rough estimates, because we do not have a real case at hand. I had to extrapolate things based on existing games, simulated data, and general experiences on game development. Every game is unique - the volume of installed data and performance of the game depend on game engine, programming language, disk type, disk format, CPU, mother board, GPU, RAM etc. Disk storage consumption is probably most accurate measure of these.

 

Hardware

In this case the focus is on games that are installed on PC or game console, and played off-line, as self-contained instances. If player can invite (limited number of) other gamers to play in the same instance, it does not change the situation much, since all game data must be contained within the instance acting as the server. It is obvious that all game data should fit into the hard drives, and game mechanics should be such that game performance is reasonable.

When you develop a game, it is wise to use a mid-level PC as a benchmark machine to test its performance, and generally keep the minimum specifications as low as possible. This means we can expect that an average player has only couple of Terabytes storage space or less, medium level CPU, and 8-16 Gigabytes of RAM. We have already concluded that limiting factors are storage space and execution time, so the level of GPU does not affect procedural generation, unless we develop our procedural generation algorithms to use GPU as fast parallel calculation unit. However, parallel execution is harder to manage, and it is doubtful if it would be even necessary.

So, all our game world data cannot be more than couple of Terabytes - not during installation, not during initial  generation, not during the game play – and preferably it should be less than that. Similarly, the generation algorithms cannot be too complex, and we cannot generate too large areas at one go, or the generation will slow game performance or make breaks in game, where player must wait while game generates environment because CPU cannot handle all calculations and main memory is constantly full and swapping.

One crucial factor about data storage is the data density on the disks. This depends on file system and how the disk was formatted, what is the block size, what kind of fragmentation may appear etc. in conjunction with the content of files and data. This effects how much space each file and database record will take, or to be more precise – how efficiently it will fill the disk with data. It will also affect the access speed, although more important in this matter is the type of disk (SSD or HDD), and its internal access speed. Data density can give us estimates of minimum amount of storage space needed when we can estimate how many bytes a typical game world data record takes on average, and how many data records and files the game universe will need during the generation and during the game play.

If data sets, databases and files grow extremely large, on critical problem emerges, when the probability of data corruption rises, due to possible physical faults in disk, and occasional problems with file systems etc. This is again pure mathematics and probability – any disk block has certain probability to get corrupted. More used blocks, more files means that the probability of at least one of them getting corrupt rises. No gamer would enjoy that hundreds of hours of game progress get lost because the amount of data makes it prone to fail.

One remedy for these problems is data storage redundancy, e.g. RAID disk systems and constant backups. Contrary to popular belief, RAID is not a substitute for backup. Preferably you should use both. However, our hypothetical generic medium level gaming rig does not have RAID because it would cost more. Even backup system will cost more, because you would need large disks to backup your data, and preferably it should be automated. Making RAID and backup system a minimum requirement would not increase the sales.

However, if we have dedicated cloud server host designed to handle large number of gamers in persistent open world setting (so called MMO games), the situation changes dramatically. Those MMO game servers are not limited by confinements of one PC or game console – they have usually complex infrastructure using multiple servers, switches, and network connections. Game server, databases, authentication etc are all handled on multiple, redundant servers using RAID, and host can enlarge their disk space dynamically while data backups are made automatically.

Obviously cloud servers do not have such limitations as one PC box when it comes to disk space, but they still must consider data structure and access speed optimization carefully. Cloud database can be astoundingly large, but even they do run on real physical hardware as virtual machines, and they are bound by access speed, network speed, CPU speed etc. Requirements of algorithm and data structure optimization remain similar, because at least some components and data has to be loaded on client’s PC, and it has to be able run that data and game mechanics.

MMO games are not the focus of the original question, but their infrastructure gives us one option if we drop away the condition “offline”, and deploy a single-player game with online, central data repositories hosted by publishers. Game world could be extremely large, because there are not same kind of limitations of disk space, like on a PC or a console. Player’s game instance loads only those data sets which are relevant to his game situation, but the large bulk of data is always available on game servers. Since this is not a multiplayer game, we do not have to worry about synchronization between servers and clients, because the actual game runs only on gamer’s PC. Cloud server is only a data repository.

However, there would still be careful balancing between two extremes. Does the cloud server do most of the procedural generation work, or is it done on gamer’s PC. There are pros and cons on both cases. In the first case we do not have to store much data on PC, but we must transfer lots of data from cloud server to PC according to game events and players movement. In the second case, cloud server would just send seed data, parameters, some asset data etc, and then client will generate the final data for game environment.

Considering how large these open world games have grown in disk space needed (even without hundreds of planets), I would not be surprised if more games in the future would adapt this option and offer large scale game data storage as a service. Many games already have cloud saves option, but this would be bit different. Obviously, the next step is to stream the game completely from cloud game servers. However, these are options that go beyond the original question, so I will not discuss about them further.

 

Structure of Universe and Data-Oriented Design

I will start with the assumption that the game universe must be believable, with some internal logic and rules. It can be fantasy, science fiction or their combination, but the universe should have its own rules. This means that all created elements must be consistent in relation to each other, and it most definitely cannot be “just random data”. Every square meter of environment must have elements which realistically belong there. Every living being should be in its real habitat, rivers should flow according to laws of nature. Weather patterns should be realistic.

Changes from one area to another should something that you can reasonably expect to happen in natural environments. If there are civilizations on planet, then their habitats and transportation routes should follow some logic. Inevitably this leads to numerous interconnected environment elements, like rivers flowing from higher areas (e.g. mountains) towards lower areas, eventually running into other rivers, lakes or sea coast. Roads exist usually between cities, towns, and locations with some resources.

Although universe building involves scientific theories like ecology, geology, physics, chemistry, hydrodynamics, biology, astronomy, evolution, sociology, and much more, we don’t have to be omniscient to create a game universe. We don’t have to  simulate every tiny detail according to those theories, but we should have enough understanding about the principles that “keep the real world running”, and subsequently should guide the world creation process, so that the resulting game universe and its details give an impression of logical and consistent structure behind it all.

However, the number of elements, details and data is enormous. I have programmed different kinds of universe and world generators for different purposes, and I can assure from those experiences that it is very complex and multi-layered process. Before you can even start to program any kind of algorithms or programs, you must collect lots of relevant information about how world and universe works. Then you must condense that information into structured data of different elements and their details, and rules, which steer the creation, combining elements together in coherent and consistent way. That condensed information must be stored into a database or structured files, so that it can be used efficiently in creation process. After that you can develop actual algorithms and programs and start testing whether your procedural generation works.

This is called data-oriented design or data-oriented programming, which focuses on optimal transformations of data and focuses on modelling programs as transforms. Transforms are abstractions of code that solely focus on the mapping of inputs to outputs. In this case, the inputs are our seed data and element data including manually pre-designed elements (like narrative locations, game characters, world histories etc.), and then those are transformed (according to previously defined rules with random variation) to outputs, which are ready-to-play 3D game environments.

It has been claimed that traditional object-oriented programming (OOP) results in poor data locality,  more so if runtime polymorphism (dynamic dispatch) is used (which is especially problematic on some processors). Although OOP appears to "organize code around data", it actually organizes source code around the interaction of data types and their relationships, rather than physically grouping individual fields and arrays in an efficient format for access by specific procedures. Moreover, it often hides layout details under abstraction layers, while data orientation wants to consider this first and foremost.

Data-oriented design and programming is important in this case, because as we will see later, the amount of data in game universe with thousands of detailed planets is literally astronomical. Simultaneously the generative algorithms are complex. Combination of these two factors mean that even if the execution of one element is something that CPU and RAM can handle without any impact on performance, thousands or millions of elements can make real-time game unplayable due to very low performance. So, we must use methods which use CPU cache and instruction pipelines efficiently, and out-of-order processing or other advanced CPU architecture. We can increase the clock speeds, but that can cause other problems. As CPUs have become faster alongside a large increase in main memory capacity, there is massive data consumption that increases the likelihood of cache misses in the shared bus, otherwise known as Von Neumann bottlenecking.

Although we cannot change existing CPU architectures, we can choose programming language. Important factor in this case is how it handles memory allocation, and how you can optimize it. Many game engines operate with C++, but there are some modern high-performance programming languages like Zig and Rust. Whatever you use, it becomes really critical to avoid memory leaks (allocated memory is not properly released after use), null pointer errors (memory allocation failed), dangling pointer errors (some other part of program releases memory) and other related problems. When the volume of data increases and number of calculations rises, even one or two such problems may clog the system fast.

 

Workflow Options

It is very clear at this point that the task to develop a game universe with thousands of planets is enormous, and too big to be designed completely manually. There is not enough workforce to design all elements and assets individually, and even managing such work is overwhelming. Even if it were possible, there would be so much individual content data, that fitting it on gamer’s PC could be unmanageable. We have already seen open world games with several hundred Gigabytes of installation data due to 4K textures, and those games are smaller in scope. So, it is reasonable to expect something that is in Terabyte scale or beyond. This should be considered a very optimistic minimum.

Beyond manual design, the options are fully procedural generation, or mixture of both procedural generation and manual design. From experience we know that fully procedural generation has its limitations to produce interesting content from narrative perspective. Fully procedural world could be very accurate in details, but overall feeling is probably shallow. Someone probably would suggest using AI produced content, but I am very sceptical. My experiments with e.g. ChatGPT produced content shows, that such content can be inconsistent, illogical and unusable.

Biggest problem from game developers’ perspective is the lack of control over the final product. As a game developer you want to offer a good adventure, interesting content with certain quality in it. With fully procedural generation you cannot guarantee that. Even smaller procedurally generated games like NetHack etc. have shown, that totally unplayable results are possible, occasionally all routes lead to dead end, quests could be impossible to solve etc. But most importantly – procedurally or AI generated content is often uninteresting, and shallow compared to human designed complex, multi-layered and surprising storylines and events. Besides, AI does not really invent anything new, it only recycles, re-organizes and connects elements from the material it has analysed and learned.

So, our only sensible option for this task at hand is mixture of both procedural and manual generation. That way we provide a rigid, planned skeleton for universe, its elements, locations, characters, and stories, augmented with large bulk of procedurally generated content. Then our only concern is to design those procedural generation algorithms to seamlessly fit in with that bespoke content.

Since we are using mixture of procedural generation and manual design, we should also think about modularity of content. Instead of making single-use elements, we can save data storage space by making modular elements which we can use to construct larger elements, and thus provide variation in game universe.

Another important question is whether we generate all the content at start or generate it on-demand during the game. If we generate everything at start, it will make more smooth gaming experience. However, it will obviously demand more disk space, and take lots of time. On the other hand, most of that work could be waste of time and disk space, if player does not ever visit those locations.

Better alternative is to save disk space and create only those locations and environments where player is really going. It is possible to create a streaming system, which procedurally creates and loads nearby environments gradually, in suitably small chunks, without noticeable impact on performance. These are like co-routines, where a large and time-consuming task is executed as segments – only part of the whole process is executed each frame. That will need careful planning and programming, and suitable database system to support such process. As a sidenote, co-routine is 60 years old invention, since they were used first time in Assembly programming around 1958.

 

Procedural Hybrid Generation

Although procedural generation is random, usually the randomness is used in deterministic way (e.g. Mersenne Twister), so that we get reproducible results. It makes testing easier and gives us some control over the process itself. Process is random (or pseudo-random, to be precise), but with same seed number we always get same results. With suitable variation in seed numbers, we can use same algorithm and parameters to get different results of same type. This way we can create large environments as small segments, each having different seed number and other parameters. It is like compressing large environments into seed number and parameters, which are stored in a database.

How much seed data do we need in the beginning depends on how we choose the scale of environment map units, how much individual control do we need on those units, and how much data we plan to generate in each phase of generation. It would be absurd to think that we can create an enormous universe in one go, and it would be just waste of execution time. We must divide that process into several phases on several levels of detail, and then create appropriate content based on what we need as game progresses.

Procedural generation is usually very complex and time consuming, so it would be impractical to constantly use the seed number, generate results, and apply any necessary deltas every time we need that element. Usually, we generate those procedural elements only once, and then store them in some database or on disk as files to be used again during the gameplay. This could be optimized even further having e.g., a large (slow access) database containing whole universe, planets etc., and a fast access database which contains only data concerning the player’s immediate surroundings.

Since it is obvious that there will a very large volume of data, it is necessary to make clear definitions about the persistence of different data elements. There is data which is important for the consistence of game world and narrative of game, and thus it should be persistent, and stored for later use. These are elements that should not disappear without logical explanation. An example could be a treasure box, that player hides in the ground very carefully. On the other hand, there are lots of things which are not important for the consistency of game world, and they can be considered merely decorative elements, which can be created whenever necessary. Such elements need not to be stored and can be generated again in the future. An example of this could be some debris and litter on city streets. If they are just random debris without any narrative or logical function, there is no need to store them as data.

One thing is clear, though. We must define the elements and locations of procedurally generated universe as seed numbers and parameters, and that data must be stored on disk, and it has to be available all the time. Usually, procedural world is created in map segments, for example. You divide the game world into areas of suitable size,  define its seed number and other data parameters like environment type, and record them in a database.

Some of these map segments may contain manually created content, and their definitions are also recorded in their own databases, linked to other content. All this data can be linked to more accurate height map data, shoreline data, road data, lake, and river data etc. During the game this data is combined and transformed into 3D representation of game world, with procedurally generated details. However, there remains one relevant question – the suitable size of one map segment.

With small map segments we get much more control on details, placement, and general outlook of environment. Problem is that the number of map segment increase exponentially, because we divide map area in two dimensions or map volume in three dimensions. If we use 100m x 100m map segments instead of 1 km x 1 km map segments, we will have 100 times more map segments. If we use 10m x 10m map segments, we will have 10 000 times more map segments. If our game uses 3D volumes to map environments, then each such change into smaller segments will increase the number of map segments 1000 times, 1 000 000 times etc.

As said above, each map segment is described with seed data and parameters, and this must be stored as data on a disk. So, even before the procedural generation process we may have a huge seed data set, if our map segment size is relatively small. This also concerns about generated details if they have to be recorded on database.

 

Layered Generation

Previous example showed that it is preferable to use quite large units, if we want to keep the original database of seed data manageable in size. However, large unit size leads to the problem that the active map unit could be too large for memory, or at least it could impact game performance severely. We could solve this trade-off dilemma with layered procedural generation.

With layered generation I mean that we use larger first layer basic unit, like 10km x 10km or even larger, and then during the game you use the seed parameters of that layer to generate smaller, second layer units, like 100m x 100m. You could use even more layers if you want. You can visualise this as layers and zooming in a map, with ever increasing levels of detail, dividing each area and sub-area into smaller and smaller pieces.

It is obvious, that this way you will have less seed data in the beginning, but during the game you achieve the advantages of smaller basic units. Each unit in upper layer determines and guides the procedural generation of lower layer units, and so forth.

If you are not careful, this could lead to inconsistent environment cells on boundary. In this case you must program your procedural generation algorithms to adjust the cell parameters, to keep environmental changes smooth and realistic.

This can lead to considerable savings in database size, but it will obviously make your database more complex, and it will make all data structures and procedural generation algorithms much more complex. There is much more indexing and other data functions that must handle those multi-layer structures properly.

However, you must remember that the size of the database will grow during the game, and if someone speedruns through even 1% of the game world, there will one huge database clogging his computer.

 

Level of Dynamics and World Instancing

At this point we should think about how dynamical the game universe may be. Is the game universe totally static, where nothing changes, or is it totally fluid, where practically everything, every element and detail can change. Usually, open world games are between these two extremes. This will affect how much the game world data may grow during game play.

Another question is the number of different game instances or game saves that may be stored. There are two cases – different saves based on same original game world instance, and completely different game world instances. In the first case the game world is originally same, but the game progress is bit different, and it may have been played with different characters. This means that we just need to know how some elements may have changed during the game play. Size of this game save or changed delta data is relatively small compared to the whole game world data set.

Second case is totally different, since it means that whole game world data set must be created for each game instance. These are completely independent versions of game world, perhaps sharing only some common assets, while the maps and locations of elements are different. This is like original Dwarf Fortress, where you can create a completely new game world, if you want. If you want to store several versions of game universe on the same PC, then the requirement for storage space is doubled, tripled etc., depending on how many versions are allowed.

Although from users’ point of view both cases are just game saves, as a developer you must understand their differences, and their implications to data storage space consumption.

Similarly, you must consider how to store general dynamic elements of game universe. While designing the game universe, you should consider how it will be simulated, to give the impression that it is a living universe, where things happen even if player does nothing. This means things like events and seasons in nature, changes in weather, animal mating cycles and migration patterns, life cycle of communities and civilizations, trade, crops, renewability and availability of resources and many other things. This will depend on the style and focus of the game, of course.

All those changing simulated elements must be defined as same structured data and rules as previously discussed. How much this will increase the volume of data depends on number of changing elements, and how do you plan to record their changed status during the game play. If you overwrite old data with new, then this increase is minimal compared to total volume of data.

 

Earth as an Example

Let’s assume that we have Earth sized planet that we want to detail using procedural generation as detailed above. This means that we must detail an area that is about 510 000 000 km2 in size. We will generate everything in the environment procedurally. It is obvious that we must divide this area into smaller basic environment units or cells, using some kind of grid system where we give location coordinates to each cell. Furthermore, we are using some generative algorithms to find out what features there is in each cell, like vegetation, landform, structures, animals, water, resources etc. – all those elements that we need in the game. It is fair to assume that we will have several dozens of parameter values for each cell.

All that would be our minimum data set, which we must have available when we generate further details of our game world. On top of that we will have some manually designed locations, landmarks, characters etc. Generation process creates each unit (and nearby units) when necessary, creating landscape, vegetation, and other 3D features according to parameters, and we have 3D environment, where we can roam freely.

At this point the most important question is the optimal size of the basic environment unit. It would be tempting to use small sizes like 1m x 1m, or 10m x 10m, but it does not need that much thinking to understand that it is unnecessary. Although many small details can change within 1m or 10m, the general, overall outlook does not change that much. If we use that small unit size, we will have lots of units with practically similar data, and that repetitive data will be just waste of storage space.

I would use something like 100m x 100m or 1km x 1km. If we think in terms of human visual perception and environmental factors, it is sufficiently large to contain lots of interesting features, but it isn’t too large. Since we are generating everything within that area procedurally, there will be sufficient variation within that area, and probably no 10m x 10m area will look similar.

Of course, there is no law to prevent us using even larger unit sizes, like 10km x 10km or even bigger. Only problem is that it will lessen the environmental variation, because usually our procedural generation is based on the idea that these basic units are of certain environment type. Also, the procedural generation of this unit will take more time and will need more memory and storage space, because there is 100 times more actual game scale space.

There is a trade-off between basic unit size and the actual game scale. We should choose a basic generation unit size which gives suitable generalization of some area, containing a suitable number of features and elements which do not overload memory, and guarantee good performance. This is part of game optimization. When you have chosen the basic unit scale, I advise you to stick to it.

 

Database and Engine Testing

I created several test databases for this purpose to find out how much they would use disk space. That would give me a rough estimate of data density. I used SQLite, which is very appropriate for this purpose, and would be something that could be used as data storage for a offline PC game. SQLite is fast, and it is used in millions of instances running on critical hardware, so it is more than suitable for a real-time game. One advantage of SQLite is that it does not use server, it is file based, but it implements all relevant SQL features. That way it gives us quite clear view on how database and data density grow in relation to number of data records.

According to my tests we can expect data density of 0.02 KB/record for very simple parameter set, and about 0.25 KB/record for more complex data sets. Creation times for these databases with several million (simulated) records were from 1-2 hours to nearly 8 using very fast SSD as storage. Similarly, a simulated processing of those records took about four time of those figures.

To be more certain I made some tests using Unreal Engine and my own procedural generation systems, which I have demonstrated on my videos on YouTube and LinkedIn. Generation and load times for 100m x 100m area were always several seconds or more, depending on the complexity of process. Generation and loading of a very detailed and complex 100m x 100m area can take easily couple of minutes execution time, and it will strain RAM usage, although I have quite fast processor, 32 GigaBytes RAM, and fast SSD storage.

 

Implications

Game cannot know exactly what game elements it will need, so it must generate as much as possible. Visually the absolute minimum is to generate all elements that the player can see, and everything that is connected to those elements in some way. But as environment data, we must generate and load usually much more, due to complex data structures, and their hierarchies and dependencies on other structures.

Since the map segment procedural generation execution times are so long, it is obvious that the procedural generation of environment cannot be done in real time, at that moment it is needed, but it must be done as controlled slow background processes. Game system tries to prepare nearby environments (neighbour map segments) on the background, so that they are ready and loaded if player happens to go there. We must generate more elements that will be needed, thus wasting CPU and memory resources on data objects which ultimately are not used at all and will be deleted. All that will affect garbage collection processes, too.

In this kind of dynamic gradual content generation system, we can estimate that at least 15-30 percent of generated and loaded objects are not used, but quite soon loaded off memory, because player moved to other direction. Vast amount of RAM and cache and CPU execution time is wasted, which otherwise could be used to increase performance  and detail level of game world, thus making the game better. Since the resources in a PC are limited, all that wasted memory and execution time is away from something.

This affects game play too, of course, making fast movements from one location to another impossible, because game system cannot generate necessary environment map segments fast enough. This would make fast land vehicles impractical because their use could affect performance. Flying would be even more problematic because it will increase the visual range to maximum. Only way to (possibly) overcome these problems is to have a “low detail” version of each map segment data or its generation process (quite like LODs and similar systems), where the algorithm generates only those elements which are necessary for overall visual outlook of that area and generates more details only when necessary. This is quite like the layered generation I described above. However, this makes all data structures and algorithms very complex, which increases the probability of data errors and inconsistencies.

Let’s then consider how much data storage space would be needed. We have four types of data to consider – the initial seed and parameter data, manually designed data, generated game element and map data, and (possibly) some cache data if we don’t store all generated data. As an example, we use Earth, its area is 510 000 000 km2. We must divide that area into map segments. My database tests above showed that the data density is 0.02 kB/segment – 0.25 kB/segment. This table shows number of map segments and how much disk space we need with different sized map segments for either simple or detailed parameter sets:

 

Segment Size                      N             Simple param. set      Detailed param. set

100mx100m                      5.1e10                  1 TB                                       13 TB

1km x 1km                          5.1e8                    10.2 GB                                 130 GB

10km x 10km                     5.1e6                    102 MB                                 1.3 GB

100km x 100km                 51000                   1 MB                                     13 MB

 

Those figures represent the amount of seed data for just one planet, but our goal was to create a procedural universe with thousands of planets. That would mean thousand times more initial seed data. Even with rough scales the total amount of seed data will be large. If there are super-Earths among those planets, the data grows exponentially. One option is to use smaller planets, but this is not realistic either, considering our current theories on planetary evolution. Mathematics and physics set the rules.

Layered procedural generation is the only possibility to keep initial data relatively small or manageable. If each planet is mapped in 100km x 100km scale, and during the play those segments are further divided into subsegments, layer by layer to 100m x 100m scale, then the size of the generated map and game element data may remain at reasonable size.

However, this is not a good choice considering the game play, and game experience. Since the initial map segments are so large, we do not have very much control over the smaller, detailed segments. This can lead to inconsistencies, or even illogical details on those smaller segments. Also, the access of saved game data and content will get slower when its size grows. This is a common situation in any database, which stems from algorithmic complexity.

Even if we use layered generation, it is obvious that fast travel would be hard to manage, for the reasons explained above. Flying over the terrain needs special solutions, probably some pre-generated low level data, which will increase the size of initial data.

 

Summary

As we can see, there are dozens of factors that affect the procedural generation of game universe with thousands of planets. Data storage space and execution time are the most crucial factors that limit this kind of task. There are many trade-offs that must be considered. Certain options will make initial data manageable but increase execution time and use of memory and cache.

If we keep the requirement that game has to an offline game, then I see that the only option is to raise the minimum requirements, and the game would be playable only on high-end PC. Even then it needs serious optimization, and it is doubtful that it could work. It is even more doubtful if it is worth it.

Realistic option is to use online data repository for game universe data, and constantly load only those parts which are necessary for current game situation. This will leave lots of PC resources for more critical use. However, this will put hard requirements for network speed and reliability.

One thing I am certain – I do not recommend this kind of game development project as a first learning project.

Tero Vuorela

AI Technology Advisor, ex-Google, ex-Microsoft, ex-Nokia, Founder & Investor, Business Angel, Top Voice in AI

10mo

Very thoughtful analysis, which probably doesn't reach those who should read it.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics