Mitigate Failing Data Loads

Learning Objectives

After completing this unit, you’ll be able to:

Use a test performance strategy to maximize enterprise scale.
Avoid record locking errors.

Scenario 1: Conquering Large Batch Sizes

A large retailer has been in business for 100 years and has a loyal clientele of customers. The retailers’ daily sync process upserts close to 12 million records into Salesforce. The process syncs changes in their Accounts (1–4 million) and Financial Accounts (6–10 million) data along with a few other objects. The retailer leverages Bulk API for their customer financial accounts load and SOAP API for their accounts load. Using SOAP API supports the businesses smaller loads by getting the data in multiple files. Each load is around 1 million records from different systems for their accounts.

The retailer also has a few batch Apex jobs that are kicked off around the same time to do post-processing work on Accounts. The customer requires all of their jobs to be completed within a 3-hour (5–8 AM EST) window. This time is crucial because the updates need to be made before the first US East Coast business user logs in.

A month ago, the retailer did a code deploy in which they added two new Lightning components to their home screen. They also added some additional triggers to cover brand-new functionalities on Accounts. Since this was a small update, no major performance testing was done prior to deployment. Over time, they began to notice that occasionally the daily jobs took a long time (5–8 hours) to complete. The delay threw their 3-hour SLA off and the business users weren’t happy with the result. A few users also encountered locking errors when they tried to edit data. Some users started complaining about slowness when accessing their home screen.

Let’s digest the details of this scenario to pinpoint the root cause of these issues.

Important Scenario Details

“Daily sync process upserts close to 12 million records into Salesforce. The process syncs changes in their Accounts (1–4 million) and Financial Accounts (6–10 million) data…”
“The retailer leverages Bulk API for their financial accounts load and SOAP API for their accounts load.”
“The customer requires all of their jobs to be completed within a 3-hour (5–8 AM EST) window.”
“Daily jobs took a long time (5–8 hours).”

Things to Watch Out For

Within this scenario there are two categories to watch out for. How data loads and how Lightning components are configured have a large impact on not only the delays that are impacting this enterprise but also how it scales. Now that you’ve read through this scenario, think about these questions and how they should be addressed to increase performance.

Data Loads

What are the batch run times? Is it trending toward or hitting the 10-minute mark resulting in retries?
Are there any errors encountered during data loads?
Are there any parallel operations happening that might lead to locking?
Are there any inefficiencies in the new trigger code or any validation rules added?
Can the SOAP API job be run as a bulk job?

Lightning Components

What is the home page experienced page time (EPT) and network latency?
Are there any inefficiencies in the underlying Lightning component?
Can the Lightning component be made to load on click of a button within the home page?
Are all the best practices around using the latest browsers, VPNs with sufficient bandwidth, and laptops with sufficient battery power being followed?

Anti-Patterns

When reviewing scenarios, developers and architects usually think about patterns that they can use to resolve the problem. In this module, we take a different approach and discuss common anti-patterns that materialize when customers implement Salesforce. Anti-patterns are those common solutions that are ineffective. Initially these solutions look appropriate, but in actuality the consequences outweigh the benefits.

Let’s take a look at the anti-patterns in this scenario and best practices to consider instead.

Anti-Pattern:No pre-assessment done to determine batch load times in sandbox.

Best Practice: Before using Bulk API, try using it within a full copy sandbox to identify the pattern, the sequence of load, and any locking issues that can arise.

Anti-Pattern: Using SOAP API for nightly loads greater than 500,000.

Best Practice: Use Bulk API when loading a large number of records, like over 500,000 or more, into Salesforce. Use SOAP for smaller datasets. When it comes to syncing data from external systems into Salesforce, sync only as much data as needed to meet the business use case. Use virtualization techniques to “view” data from external systems to avoid syncing large volumes of noncritical data. In the scenario above, the customer used SOAP API for its accounts data load that was between 1 million and 4 million records. Although it was separated into smaller loads, each load was still 1 million records. The size of the dataset is the deciding factor when you choose between SOAP and Bulk API.

Anti-Pattern: Running batch Apex in parallel along with data loads.

Best Practice: Given how the Apex code is running in this scenario, there isn’t an understanding of the Salesforce locking mechanisms and how those can affect bulk data load. Record locking errors are a common problem when coding data migrations with Salesforce. To lessen the chance of parent record lock contention among parallel load batches, presort the child records by parent ID. This way, all of the contacts that belong to the same account have a higher likelihood of being in the same batch and greatly reduces the lock potential.

Anti-Pattern:Expecting a strict SLA for an asynchronous operation.

Best Practice: Asynchronous operations often correspond to long-running bulk jobs. In turn, there is no guarantee or SLA that determines when queued jobs will finish. If Bulk API is chosen, you are allowing the server to process records once it has the resources. This can take a few minutes based on other bulk jobs that may be being processed. Synchronous tasks always have priority over asynchronous tasks. Use asynchronous operations when there’s not a need for immediate processing.

Anti-Pattern: No performance assessment done for custom Lightning components before deployment.

Best Practice: If your application includes custom functionality, it’s important to define a performance test strategy before you deploy it into production. Early in the process, determine and document your use cases and strategy, which tools or actors to use, and which types of tests should be run. However, running a performance assessment or stress testing for out-of-the-box Salesforce features is unnecessary.

Before any custom functionality is tested, it’s necessary to create a case to inform Salesforce of what type of testing is planned.

A common performance test that customers run are single user tests. One user is created to navigate from end to end to find any bottlenecks. Single user tests are there for you to understand if there’s any likelihood that you will hit any governor limits. In this scenario performance testing should have happened within a full copy sandbox. When you test in a full copy sandbox, you can load synthetic data based on how much data volume is anticipated in production. Did I mention that it’s important to test… in a full copy sandbox? OK, just checking. It’s important to keep your test environment parallel to production, based on the data volume and shape. This provides a real performance view into your application.

If you have ownership between Account and Opportunity, keep the ratio and data profile the same.

Keep use cases in the forefront when testing for performance. If you have multiple scenarios that need to be tested, test end to end. This includes your back-end systems that need to be integrated downstream in your enterprise. Start from the UI, configure the Lightning console, check the performance of your pages, test the integration to the back-end systems, check the call-out time, and check the performance of your custom code.

There are tools available that give you an end-to-end view into the performance logs. Those tools can be used to visualize which layer is where the performance is taking a hit. For integration, it’s best to mimic the number of virtual users to be 50% of what you expect in production. For example, if you have 1,000 users who will log in to your application, try to mimic up to 50% or the maximum capacity allowed within your sandbox. It’s best to do this kind of testing during off-peak hours. Since Salesforce is a multitenant platform, it helps to test when there are no more tenants running their peak traffic during the day.

In a Nutshell

Whew, this customer had quite a few anti-patterns that caused some hefty roadblocks. Now that you have done the investigative work, you should have a better understanding of how to leverage testing, avoid record locking errors, and scale bulk data load. In the next unit, let’s find out how to resolve search and reporting issues within an enterprise.

Looking for Help?

Discover More