Tools SQL Server on Linux Lead image: Lead Image © Fet, Fotolia.com
Lead Image © Fet, Fotolia.com
 

What's new in SQL Server 2017

Ready for the Future

Microsoft SQL Server 2017 introduces SQL Server on Linux, along with machine learning services, support for graph data, and on-premises Power BI. By Evgenij Smirnov

Microsoft SQL Server 2017 has been available since October. The new version opens a new chapter in the history of the database server in several respects. Microsoft is looking to make its relational database fit for the Internet of Things (IoT), big data, and machine learning, and when it comes to performance, SQL Server now acts much more intelligently.

Much like other Microsoft server products, SQL Server development is strongly driven by the Azure Cloud, big data, and IoT. The SQL Server landscape is now identical to version 2016: Enterprise, Standard, and Express. Hardware limitations and supported features, as they existed in 2016, remain the same.

SQL Server on Linux

SQL Server on Linux has become a reality with SQL Server 2017. It is currently supported on Red Hat Enterprise Linux (RHEL) 7.3 and 7.4 (see also the "Migration Discount" box), SUSE Linux Enterprise Server v12 SP2, and Ubuntu 16.04. As expected, the RPM package for RHEL can also be installed and executed on CentOS 7 without problems (Figure 1). You will not find the installation packages for Linux on the SQL Server image; you must download them. The easiest way to do this is to include the corresponding Microsoft repositories, as described online (Figure 2) [2]. The size of the download is 150 to 200MB, and after calling a configuration script, your SQL instance is ready for use.

The RHEL version of SQL Server 2017 runs on CentOS.
Figure 1: The RHEL version of SQL Server 2017 runs on CentOS.
Additional downloads streamline deployment of SQL Server 2017.
Figure 2: Additional downloads streamline deployment of SQL Server 2017.

Remember also to install the SQL Server Agent service (separate mssql-server-agent Linux package) and full-text search (mssql-server-fts) before commissioning the new SQL instance, because the mssql-server service requires a restart. The sqlcmd command-line query for Linux should reside on a database server, as well, in case network connectivity is interrupted. This utility is also in a separate package (mssql-tools). Overall, deploying SQL Server 2017 on Linux takes significantly less time than on Windows. However, the range of functions are significantly different between the two versions.

Courage to Fill the Gap

The database engine is currently the only SQL Server component available on Linux. Reporting services are still missing, as are SQL Server Analysis Services (SSAS), Data Quality Services, and Master Data Services. The database engine on Linux is not multi-instance capable – a feature that is controversial even under Windows – and currently lacks the following important features:

SQL Server Management Studio (SSMS) [3] is based on Visual Studio 2017 and is therefore not available for Linux anymore than SQL Server Data Tools. Therefore, Microsoft currently does not have a dedicated graphical management tool for Linux. With SSMS for Windows, you can connect to a Linux-based SQL instance as usual. If you also want to change the graphical SQL administration to Linux, you can use Visual Studio Code (VS Code), for example, with the mssql extension (Figure 3). Both are available for Windows, Mac OS, and supported Linux distributions [4].

The VS Code mssql extension provides graphical SQL administration under Linux.
Figure 3: The VS Code mssql extension provides graphical SQL administration under Linux.

Of course, you can use all GUI tools that exist on Linux platforms and support SQL Server to manage your Linux-based SQL instances. The SQL Server Agent does not fully correspond to the Windows variant in its scope of services. The intelligence of the maintenance plans is currently not implemented, and neither are the cmd.exe and PowerShell subsystems. Of course, the agent functionality for services that are not implemented on Linux, such as Analysis Services, is also missing.

Licensing

In addition to the installable Linux packages, SQL Server 2017 also offers a Docker image (Docker 1.8 and higher), which means your database instances can now benefit from the flexibility of a container infrastructure, provided they have the appropriate licensing. The Docker image has the same performance as the Linux version.

To take full advantage of containers, you will usually need to license a high-density environment, which requires an SQL Server Enterprise license for all CPU cores in all affected container hosts plus Software Assurance (SA) [5]. You can use this licensing variant to increase the number of SQL instances provided as required. This type of licensing is also ideal for classic server virtualization with a large number of SQL instances; SA gives you unlimited mobility of your SQL licenses within the licensed virtualization farm.

Local Data Visualization

In addition to SQL Server on Linux, Microsoft is finally implementing another feature in SQL Server 2017 that users have been waiting for since the release of SQL Server 2016: the ability to provide Power BI dashboards on premises. This functionality is not freely available for every SQL Server installation, however. You have two ways to obtain a license for the separately installed Report Server for Power BI:

A free trial version of the Report Server is available on the Power BI website [6]. By the way, your data does not have to be in an SQL Server 2017 database instance for you to benefit from Power BI. Everything starting with SQL Server 2008 is supported. Power BI Desktop remains free of charge. You can use it to develop your dashboards and publish them in the Power BI cloud or on a local report server.

Intelligent Queries

The best features are of no use if a database server cannot perform its main task efficiently: querying data from the database and making client applications or other SQL Server services available. In earlier versions, the database engine followed a rigid procedure:

1. The request was submitted.

2. The Query Optimizer made assumptions about how many rows were to be processed in each table, view, and sub-query along with their memory requirements. It drew up several implementation plans, which were given a "cost factor" on the basis of these assumptions.

3. The plan with the lowest cost factor was submitted for execution and processed rigidly.

If the assumptions of the Query Optimizer deviated greatly from the actual values, execution became inefficient. If the assumption was fewer lines or a lower volume of data sets, the server had to transfer data from RAM to the hard drive because the RAM requirement could not be satisfied. If the estimate was significantly greater than actual demand, resources were not available to other processes, whose performance might have suffered as a result.

With SQL Server 2017, the Query Optimizer becomes smarter and employs three new query optimization techniques that adjust execution plans based on the values achieved, some of which are determined while the query is still being processed. The following functions are used:

1. Memory Grant Feedback measures the amount of memory actually needed to execute the query and corrects the cached plan accordingly. This technique realizes its potential if the same query is repeated frequently. If the query is parameterized, the memory requirements can vary extremely from version to version. In this case, an extended event (XE) is provided, which switches off the function for the affected query.

2. Batch Mode Adaptive Joins decide on the join technique to be used (nested loop versus hash) only after the first results have been retrieved from the database and the resulting number of rows is not merely estimated but is precisely known. Adaptive joins are the perfect choice if the same query produces a strongly fluctuating number of rows.

3. Interleaved execution for multi-statement table valued functions (MSTVFs) changes the procedure outlined above in that the execution of special complex queries does not follow optimization rigidly, but begins as part of the optimization. MSTVFs are executed right at the beginning, so precise data regarding number and size are available for the results. Only then is the optimized execution plan for the rest of the query calculated and submitted for execution.

All execution plans prepared by adaptive optimization apply the same storage logic as the conventional plans. They are calculated the first time the query is executed and remain in the cache until they become obsolete or until the cache is cleared by restarting the instance. Therefore, it is usually not a good idea to start SQL instances on a speculative basis. Failover of the database within a cluster also erases the cache content. To use the new query optimization technique, your database must be at compatibility level 140 (SQL Server 2017). The functionality of Adaptive Query Processing is described in detail in the online SQL docs [7].

Fit for Big Data and IoT

Since its introduction in 1989, Microsoft SQL Server has been a relational database (RDB). However, the world of big data and IoT no longer works relationally: Information, and thus also data, is created simultaneously in a multitude of networked systems and often includes traces of these network topologies and hierarchies. The attempt to map such multi-networked data relationally always results in a major processing overhead and limited functionality. Therefore, classic RDBs, such as SQL Server, have lost their supremacy in the age of big data and have had to give way to special databases such as Hadoop-based TitanDB.

Microsoft's SQL product team is now striking back and has given the database engine the ability to process graph data directly. Two new table types have been introduced for this purpose: node and edge. Edges can have properties (metadata) or simply establish a directional connection between two nodes. The newly introduced MATCH clause in Transact-SQL (T-SQL) is used to search graphs. It contains conditions in the form of

 

node1-(edge)->node2

(all nodes with direct connection to node1) or more complex queries such as

 

node1-(edge1)->nodeX<-(edge2)-node2

(pairs of nodes, each connected to the same node). The names used here are aliases for the same node or edge table, so the first query could look like:

SELECT node2.first name FROM node node1, connection, node2 WHERE MATCH(node1-(connection)->node2) AND node1.place of residence='Berlin'

(Get all first names of nodes that have a connection to a node from Berlin.) The MATCH syntax is documented in the SQL docs [8]. How the performance of the SQL graph tables behaves compared with the original graph databases remains to be seen.

Machine Learning with Python

Support for the widely used R statistics language was introduced in SQL Server 2016. Using the predefined stored sp_execute_external_script procedure, R functions can be encapsulated in their own procedures, provided with the desired arguments and result transformations, and then used in SQL databases.

SQL Server 2017 natively supports another scripting language that is widely used, especially in the areas of IoT and machine learning: Python. This capability allows SQL projects to bring together the best of both worlds: robust and fast relational data management with SQL and an almost infinite stock of methods and libraries that have already been developed in Python for machine learning. Complemented by the precise, high-performance statistical functions of R, it provides an ideal platform for big data, IoT, and machine learning applications. According to a 2017 software poll [9], SQL Server 2017 covers the three most widely used languages for these applications.

R Services for SQL has therefore also become machine learning Services. However, you do not necessarily have to install support for both R and Python – you can still install only R if your projects do not use Python.

Conclusions

SQL Server 2017 is not just a great technological milestone; it also further cements the direction of development taken in the previous version. Even if your organization's product life cycles do not require the next version of the database platform for the time being, you might want to start working with SQL Server 2017 now. Linux and container support have the potential to enable a change of strategy on the infrastructure side. The new machine learning services, support for graph data, and the ability to operate Power BI locally can give a new impetus to your internal data processing.