Recover from Job Failures
- List the resources you can use to detect job problems.
- List the general problem types.
- Explain how to research email notification errors.
- Explain the best way to correct job problems on a production instance.
Linda Rosenberg, Cloud Kicks' administrator, learned that it’s best to be proactive when running jobs by configuring them for automatic troubleshooting. When jobs start failing, it’s time to take a step back and collect data so she can figure out what happened and fast.
Smooth running jobs are all up to her!
Detect Job Problems
A proactive approach means that she configures all jobs to send an email to the administrator if they run for more than a specified time. This eliminates the element of surprise.
Sometimes, a job has problems that Salesforce B2C Commerce records in the log but that don’t actually stop the job. If these errors persist and are significant, she can add processing to her job pipelines to detect and stop the job.
She learns that it’s good practice to check logs regularly to identify problems—chronic or otherwise.
Linda can configure email notifications by setting values in individual schedules or job configurations. To keep things simple, she uses the same email template for all email notifications. B2C Commerce sends notifications in a standard text file and includes:
- From address
- To list
- Body text
If you don’t configure the mail server to send error notifications, you can look at the log files. B2C Commerce logs errors in the system error log and the syslog.
If a job is stuck in the Running status and has acquired at least one lock, Linda needs to release the lock before she can run the job again. To release the lock, she stops and restarts the instance using Control Center. She must have the proper credentials to access Control Center.
If a job terminates because the instance went down, or a job terminates part of the way through, the data might be a mix of updated and old. She should run the job again.
To troubleshoot data errors, she tries the following.
Replicate data from another instance
This is most useful if there is a problem with importing onto production and staging has the correct data that she can roll back to.
Import a new feed produced by the backend system
This is the most common method for recovery. Usually you must fix the data in the backend system and generate a new feed.
Use data from import feed archives
This is most useful if there is a problem with the backend system producing the feed. In order for this data to be available you must have a system of archiving feeds and cleaning up old archived feeds.
Use data from regular exports
This is most useful for data on the production system, such as product availability, or data that’s imported directly onto production, such as price books. In order for this data to be available, you must create a job that exports the required data.
This is also useful for data that exists only in Business Manager, not the backend system, such as web-specific attributes or URL attributes.
The Production Instance
In most cases, when Linda transfers data to the production instance she performs a data replication from the staging instance. However, for frequent imports of pricing, inventory, or other kinds of data, she uses jobs to transfer the data directly from the external source into production. As with staging, Linda can have an interrupted job that results in mixed data on the production instance.
Archiving Best Practices
Because Linda can’t automatically roll back production instances impacted by job issues, she always creates an archive of her existing site that she can roll back to if the job fails.
Let’s Sum It Up
In this unit, Linda learned how to troubleshoot job errors in a variety of situations. She also learned about the importance of an archival process and regular (frequent) error log checks.
Now it’s time to test your knowledge and earn a shiny new badge.