Making a database decision used to be easy when I started my career as a database administrator - which version of IMS were you going to install? Databases evolved from the need to better manage 'structured' data (which at the time was pretty much just financial numbers and their descriptions) and to have many users simultaneously update that data.
IMS was launched in 1968, based on the DL/I database developed for the Apollo space programme. At that time databases were developed to handle transaction processing and had to be navigated in a rather fiddly way.
The next major step forward was Ted Codd's papers proposing a 'relational' approach, in which data would be stored in tables as rows and columns rather than the linked lists used previously. The first commercial database to use this idea was Ingres, launched in 1979, while IBM's own System R developed later into its DB2 database in 1983. The key advantage of relational was abstracting the physical structure of the database into a more abstract set of tables, so programmers no longer had to physically navigate up and down hierarchies. Instead a program called an optimiser worked out the most efficient physical path to supply the result of a database request, which was written in a language called SQL.
At this time almost all the emphasis on database usage was for transactions, and databases competed on how fast they could process transactions with high degrees of concurrency. Other products appeared, such Oracle and Informix, but other than a few experimental databases, the core relational approach was unchallenged and by the end of the millennium the relational model was riding high, with Oracle, DB2 and SQL Server the dominant platforms.
However the Achilles' heel of these databases was that they were basically tuned for transactions and could be very slow when executing complex queries. Teradata carved out a billion-dollar niche by applying multi-parallel processing (MPP) to this problem, which in turn spawned other approaches such as the hardware accelerator used in Netezza.
As warehouses grew and grew it became clear that columnar databases - where each row in a column is stored consecutively on disk - were clearly more suited to analytic queries, since they vastly reduced the I/O needed to satisfy most queries (at the price of slower loading and updating). Although tried as early as the late 1960s, only Sybase had really managed to succeed as a commercial database until just a few years ago.
However as data volumes inexorably grew, columnar was married to MPP, and a new clutch of databases became popular: Vertica, ParAccel and Infobright being good examples. The most heavily marketed is HANA from SAP, which is touted as an 'in memory database' but is essentially a columnar product. Hardware continues to be thrown at the problem, with more use of memory and solid state disks to improve access times, but these are still costlier than disk storage, so for very large databases a mix is needed.
The latest databases have developed in response to the need to tackle high volumes of data that go beyond traditional structured data. Web logs and machine sensor data have their own structure, and traditional databases find these challenging. NOSQL databases are enjoying an upsurge in popularity to handle such data. 'Key value' databases (such as Redis) cache frequently accessed data into memory, while document store databases (such as MongoDB) are aimed specifically at web documents and spurn traditional schemas.
There are even NOSQL columnar variants, such as Cassandra and hBase. These databases, along with Hadoop, are firmly aimed at Big Data. Even object-style databases are making a comeback in the form of 'graph databases' such as Horton and Neo4j.
Many of the vendors of these newer databases are tiny, and there will doubtless be consolidation as customers trade off performance with reliability and cost, and traditional vendors respond. However the sheer scale of data growth means that companies are becoming desperate to deal with their databases, which are spiralling in size, complexity and license costs. The largest database in the world in 2003 was 30TB, yet Teradata now has 25 customers with petabyte databases. As a result customers are more willing to try innovative approaches, and a database market that looked stagnant at the turn of the millennium has never seemed so varied or exciting.