Skip to main content

Apologies if this already answered elsewhere. I looked but couldn’t find advice on what I suspect is a common scenario.

 

We have a SQL Server based data warehouse located in London, and Tableau Server on an AWS cluster in the US near corporate HQ (5 servers, 32 core). The problem we have is that overnight refreshes seem to be taking a disproportionate amount of time, and sometimes time out. To describe the process and what we’ve tested so far:

  1. The source in SQL Server is a static table (no joins, filters etc). It contains about 11 million rows and 33 columns, taking about 4GB of space in SQL Server.
  2. Creating an extract through Tableau Desktop on our London network takes about 4 minutes. Publishing this extract to Tableau Server takes about 8 minutes. Refreshing this extract from Tableau Server takes over _60_ minutes
  3. Reading the data directly from the database and writing the results directly to Tableau Server through Alteryx takes about 12 minutes
  4. Extracting the data through a SQL client takes about 4 minutes
  5. Creating an extract from Tableau Desktop in the US takes over _60_ minutes
  6. Exporting the data using BCP to a flat file on a London server takes 1.5 minutes, compressing and then transmitting this compressed file from UK to US takes 2 minutes
  7. If we refresh the Tableau Server extract from an AWS SQL Server to Tableau Server, it takes only 3 minutes

 

Based on the above, it doesn’t seem that the raw query execution on SQL Server (can occur in 4 minutes) network bandwidth (transmission can occur in <10 minutes), or creation of Tableau extract (can occur in 3-4 minutes on desktop or server) are the bottleneck. The best guess we have is that this is related to the verbosity/’chattiness’ of the SQL transmission protocol (TDS?) over TCP.

 

Can anyone confirm the theory about protocol as a bottleneck, or advise on further tests that would help narrow this down? If this is a known limitation, is there a best practice for optimising replication of data from remote SQL sources?  It might also be equally informative if anyone can confirm that they've got a similar set up that doesn't have any issues, since this may point to an issue with our server or network configuration.

 

Thanks in advance!

10 answers
  1. Nov 29, 2017, 4:41 PM

    Here's what I think you need to do:

    1. Create your connection in Tableau Desktop in London, add in your calculations, rename fields etc. Create an extract on the source. 

      Here's what I think you need to do:Create your connection in Tableau Desktop in London, add in your calculations, rename fields etc. Create an extract on the source.
    2. Publish this source to the server (right click, publish to server).
    3. Create reports against the source published to the server; The connection will now look like this: 

      pastedImage_1.png

    To refresh it ongoing:

    • First, open your Tableau server and download a copy of the source. It will save like TestSource.tdsx. Keep this file in London, it's now your 'master' connection file, you only need to do this once.
    • If you open this tdsx file using a ZIP program you'll note you have a TDS file (where your schema and calcs are defined) and a tde file contained within the data\extracts subfolder.

     

    This is where you setup your automatic process; it should replace the tde in the zip file and rezip it (making sure to keep the extension tdsx) then use tabcmd to publish the connection back to the server and overwrite the existing one.

    In theory you could have a folder on your network share or something in London called tdsxsourcefiles; you could unzip the TDSX into this folder and setup your process to replace the TDE in the subdir, zip the folder,rename it correctly, push it up using tabcmd. Any time you want to make changes to the Schema / Calcs etc, you just make the changes in Tableau desktop by opening the TDS file.

0/9000