Skip to main content

I've been trying to connect to a large log file-- roughly 1.5 billion rows for the 1-year period requested. Four other large tables are joined with it to generate specific details (for example User Name based on UserID).

I'm failing to produce results: if you've had a similar experience your input is appreciated!

 

Criteria:

  1. 1 full year is needed, or roughly 1.5 billion rows, though only a fraction will be shown at any given time.
  2. As a log file, users want to see up-to-the moment changes (or near up to the moment).

 

The file size means I have excluded any thought of a live connection, despite (2).

Furthermore, generating the extract in full is prohibitive: although generating 1-5 weeks is fairly quick, anything beyond that is excessive, and recreating the full extract will take more time than the 2 hours our server allows.

 

We do not yet have the Data Management package installed, so cannot schedule Prep flows.

 

My thought is to move to an Incremental Refresh and allow users to refresh the data source manually: my knowledge of incremental refreshes is very limited, however, and my controlled tests aren't yielding positive results.

 

Thoughts to this admittedly poorly-expressed conundrum? Should I say "Listen-- you might think you need 1.5 billion rows but Tableau isn't the right platform for pulling that data"?

 

Tables are currently joined through a custom SQL query, but I've been able to link them through visual connections. I heard that Tableau is more efficient working through default visual connections than custom SQL: any truth to this?

Michael Hesser (Tableau Forum Ambassador)

If this response has answered your question, kindly click "Best Answer"

5 respuestas
  1. 13 feb 2025, 21:50

    @Michael Hesser​ 

    Hi, My thoughts:

    a. I think the correct way is to perform the joins in a physical table, instead of using logical tables (which performs joins at the execution time). As Deepak says, it is highly recommendable to aggregate data if possible.

     

    b. I recommend you to try with the subrange extract refresh. This type of refresh is a delete and insert. If you select 35 days, it means, it will delete 35 days of past data from the TNOW time based on the column you select, and then it will insert records from TNOW - 35 days.

    https://help.tableau.com/current/pro/desktop/en-us/extracting_refresh.htm#subrange-refresh-for-incremental-extracts

     

    There is an issue opened when the extract has future dates (which I don't think is your case) https://issues.salesforce.com/issue/a028c00000zgvLqAAI/undefined, which seems to be fixed in recent releases: 2024.3.3

     

    If this post resolves the question, would you be so kind to "Select as Best"?. This will help other users find the same answer/resolution and help community keep track of answered questions. Thank you.

     

    Regards,

     

    Diego Martinez

    Tableau Visionary and Forums Ambassador

0/9000