Skip to main content

Hello Everyone,

Our dashboard functions as a search tool that allows our users to investigate on a transaction level granularity. When the dashboard is opened, it keeps on buffering and never loads the data. I was hoping that the tableau community would be able to help us.

Our current set up: Workbook has a live connection with a tdsx file published to the tableau server. The tdsx file is saved as an extract in the server.

We have tested it and figured out that the filters may be the reason. We have two data source filters and around 25 filters that are a mix of measures, dimension and calculated fields. But removing these filters is not an option as our stakeholders rely on these filters. 

We are looking for a way to filter the Top N of a lowest level of detail so that the data loads. "Top N" is a parameter that takes an integer as an input. For instance, Top 30 of a lowest level of detail.

Would appreciate if anyone could help us figure out how to do it or suggest any other alternative solution to help with the loading issue.

 

I have attached a sample data as well as a mock workbook

 

Best Regards,

Karma

6 answers
  1. Apr 18, 10:42 AM

    This post is near 5 years old @Katie M, although did you test my suggestions (I know you are not the OP btw). 

    You've mentioned some good points here Katie although it is important to define the context of those as while some are general best practice such as removing redundant calculated fields, those for instance will only affect an extract or pre-materialised calculations during materialisation. But those shall never affect a live connection as they won't be created as part of the outgoing SQL. 

     

    And so, it is important to understand how your data is going to be used before jumping onto the extract vs live argument. Many live environments with frequent data refreshes aren't fully compatible with extracts, that is unless you plan on building a HA-environment, where you define a single node for all backgrounder operations, and you would typically set at around 6 backgrounder agents, as this will allow you to define frequent refreshes without harming other server activity and general workbook usage. 

     

    30M rows is still very small by comparison, and may not need too much in the way of front-end performance tuning, especially as under the hood, at the data-fetch end, Tableau is nothing special insofar as how it achieves its outputs, really this becomes a matter of some very simplistic maths. 

     

    Tableau was always designed to bring SQL-controlled analytics to the masses so they could build SQL-powered reports without ever needing to write a single word of SQL. So an understanding SQL is not required, but as reports become larger and more complex, then an intricate knowledge of SQL and SQL performance is a definite must. 

    And so performance tuning Tableau is less about how to use Tableau and its built-in functions to enhance the performance, and more about tuning Tableau and the underlying model and data-sources for high-speed querying, which sadly Tableau is not so capable of, as while the data source shall re-write your query before execution in order to maximise performance.  

    There are other actions you can take that can hugely improve you performance as well such as converting your final model over to Custom SQL instead of relying on the model, and modelling your data in the database. 

     

    You are right about the row-level non-additive calculations, and that these, and semi-additive expressions should always live on the data-layer or semantic-layer, as these are enrichments, and typically benefit from becoming part-and-parcel of the model. But unless the user has specific DDL permissions, where they can change the model, the etl, or the view(s), then the closest they may get to this is the semantic-layer, which is the perfect place for cSQL routines. 

     

    I will add that care needs to be taken with Context Filters. They were once hugely popular, but began to fall out of favour ~2014, although they have begun to make a real comeback in the last 6 years, but just like may things Tableau, the developer needs to understand the purpose of context filters, and that they're not the Get-Outta-Jail-Free performance option that they're tending to be used for, because  context filters create a subset of the data that you are looking at, and this means two things: 

    Firstly, that the data server must perform more work in creating the context filter 

    Secondly, data server memory and space considerations need to be understood and evaluated before context filters are deployed. 

     

    This is because a context filter creates the subset in either temp storage like temp db, or where temporary tables do not form part of the DML, the temp table is held in memory, which can greatly affect data server performance for other users and other operations. 

    Take my earlier example, and say your context filter reduces the 30M row data size to just 800,000 rows, and lets say you're looking at Superstore, the filter is on product_category = 'Technology'. Looking at sub_category, with a 10 character length, 'Appliances

    ' is the longest word in the field for the 'Technology'. 

    Remembering that Tableau only uses unicode data-types, this field shall be defined as a unicode varchar - NVarchar, which adds 2-bytes per character, and 2-bytes total for the field; and so the word "Applicances" weighs in at 18 bytes. And then, multiplying by 800,000 rows = 14,400,000 or 14.4MB. This would mean that for this one word repeated 800k times, this one column alone is occupying a huge amount of space; and then adding this to the rest of the data in that context subset, then this would be many times larger. 

    But what makes this especially worse is that the temporary subset has no performance tuning.  

    It is a heap, has no primary key, no indexing, no partitioning, and no search statistics, which means that further filtering inside this subset will be slow, as the data server must touch every record to build the output. 

    And, context filters behave exactly the same on Tableau Server, although they do benefit for an improved performance relief as Tableau Server deploys to a column-store NOSQL in memory db, so while it lacks indexes, column-store is considerably faster than traditional paged data. 

     

    But, context filters don't just have a data performance drawback. They are limited by virtue of the data that they contain. 

    And so when a user changes the context of the subset, the whole subset is dropped and recreated, so they begin to occupy more space, with more table reads as a user plays with the data. 

     

    This is why, there is much more to performance tuning than hoping to rely solely on the built-in functions. Sure, your model might be very complex, with many tables that require them to be joined, where they cannot be culled - even if your model is a view, in which case, this is a perfect use case for extracts; but where you have the ability to model your data, and load to a table, then it is far better to ensure your data is properly tuned and indexed for the purpose, rather than hoping Tableau shall get you out of it. 

     

    It is far better to ensure the 

    model is performant, reducing text to id's that you can later alias in Tableau would add a massive performance uplift, as even SmallInt is sized to 2-bytes yet gives you a range of -32,768 to 32,767 - this is a LOT of products with a tiny data-footprint, and an even smaller effect on data pipelines. Add to this the front-end aliasing which takes place inside

     Tableau so no queries go anywhere, and even your context filters get a massive uplift as well, as even when scanning a heap, numerics are easier to read than text or full-text scanning.   

     

    Of course, there are changes you can make in the front-end, both of us have explained this, but rather than guessing as to what could be causing the problems, the best action is to evaluate, by extracting the queries that Tableau is sending live to your data server, and then using those to identify the bottlenecks. 

    You can use the performance analysing tools, either the workbook analyser, or the server analyser, but I still prefer to use the logs to capture this. I've discussed this in further detail, and have a video-walkthrough over at my site here: 

    https://datawonders.atlassian.net/wiki/spaces/TABLEAU/pages/1290600449/Discussion+Series+Errors+Tuning

0/9000