Skip to main content

Troubleshoot Replication

Learning Objectives

After completing this unit, you’ll be able to:
  • List three types of problems you can encounter with replications.
  • List three steps you can take when troubleshooting a failed replication.
  • Explain how a replication undo works.
  • List the types of logs that can help with replication troubleshooting.
  • Describe three approaches you can take to troubleshoot a hung replication.

What to Do About Failures

Since learning about replication, Linda Rosenberg has been running a lot of data and code replication processes for Cloud Kicks.

She’s encountered some issues that tested her troubleshooting skills. She had to undo replications for various reasons. She had a few failures and a hung replication. She also needed help with clearing cache. Let’s see how she resolved these.

The first step to take with any failure is to look at the replication status.

  1. Open Business Manager.
  2. Select Administration > Replication > Data (or Code) Replication.
  3. View the list of replication processes and their status.

Let’s walk through what Linda does next.

Post-Replication Issues

When Linda encounters a problem with a data replication, she looks at the replication logs on the staging and target instances for error messages. Entries that include "failed" or "ORA-" can give her clues.

  1. If the replication logs don’t provide helpful data, she checks the error logs.
  2. If the replication included multiple tasks, she tries to isolate the task that caused the problem by running tests with and without each task.
  3. If the replication of multiple objects fails, she tries replicating individual objects to narrow the cause.
  4. If a scheduled replication didn’t run, she tries to run it manually.

Undo a Replication

If the data or code doesn’t transfer properly or was incomplete or buggy (in the case of code), she can run an undo. Remember, you must be on a PIG instance to run  or access a replication.

Undo Data Replication

Linda can roll back the most recent transfer and publishing or publishing data replication process. When you undo a data replication process, you can enter a description, configure an email notification, and prevent the target instance page cache from refreshing. If you don't want to do any of those things, simply click Undo next to a replication process. Here's how.

  1. Open Business Manager.
  2. Select Administration > Replication > Data Replication.
    Click Undo beside a process.
  3. Click Undo next to the process. You can only undo the last process run on an instance.
  4. Select a target instance.
  5. Enter a description.
  6. Select the Activation Type: Manual
  7. Select a notification email trigger: When Process Ends
    You can enter multiple target email addresses separated by commas. The email notifications contain the start and end time of the process, the target system, the replication type, and the included replication tasks. If there was a failure, an error code is also included. Each process in a recurring series sends its own notification.
  8. Click Next.
  9. Select the replication type: Undo
  10. Click Next and review the process details.
  11. Click Create.
  12. Find the process in the list and click Start.

Undo Code Replication

Linda can roll back the most recent transfer and activation or activation code replication process. When you undo a code replication, the target instance reverts to the previously active code version. You can only undo the most recent Code Activation or Code Transfer & Activation process for the target instance. Here's how.
  1. Open Business Manager.
  2. Select Administration > Replication > Code Replication.
  3. Click Undo next to the process.
  4. Select a target instance.
  5. Enter a description.
  6. Select the Activation Type: Manual
  7. Select a notification email trigger: When Process Ends
    You can enter multiple target email addresses separated by commas. The email notifications contain the start and end time of the process, the target system, the replication type, and the included replication tasks. If there was a failure, an error code is also included. Each process in a recurring series sends its own notification.
  8. Click Next.
  9. Select the replication type: Undo
  10. Click Next and review the process details.
  11. Click Create.
  12. Find the process in the list and click Start.

Replication Logs

Salesforce B2C Commerce records replication process log files on both the source and target systems. These are separate from the regular error logs. They exist in: https://instance_address/on/demandware.servlet/webdav/Sites/Logs/, with file names such as staging-blade_name-appserver-yyyymmdd.log.

Linda monitors status on the staging instance. If a process fails, she reviews the staging logs. A single log can contain several days’ worth of events, so she looks for a log dated at the beginning of the replication process. The log file has a timestamp similar to the data replication task. All the log file names contain staging, regardless of the instance type. Here’s how Linda reviews the logs.

  1. Open Business Manager.
  2. Select Administration > Site Development > Development SetupBusiness Manager log files link
  3. Click the Log Files link.
  4. Look for the staging logs.

This is an example of a log entry.

[2007-01-15 21:17:12.848 GMT] ISH-CORE-2250: New replication task "1168895828901"
in domain "Sites-Site" successfully created.

When reviewing the log file, focus on certain items.

  • Scroll through the log file, which contains the steps of the process, looking for errors.
  • The final step on the staging instance is a hand-off to the target server. The staging log should have a line similar to this. [2019-01-15 21:27:09.783 GMT] Staging pipeline in live system successfully called.
  • If the success message is missing, look for an error similar to this. ISH-CORE-2491: Setting state of process with uuid='dC8KAANna1111EOTN9h9md4' from 'StartingStagingProcess' to 'ErrorAcquiringEditingLocks
  • If this staging error occurs, log into Control Center, and then stop and restart the instance. Then run the same replication again. Control Center is a B2C Commerce tool that lets you monitor the state of B2C Commerce instances and take appropriate action. If the staging instance log doesn’t have errors, look at the staging log on the target instance. https://[target_instance_name]/on/demandware.servlet/webdav/Sites/Logs
  • The target instance staging logs start with a message like this. 2019-01-15 20:29:30.321 GMT] Copy staging process with uuid=bcFvkiaalTMxM444667bVYFqBX[2007-01-15 20:29:32.347 GMT] Starting StagingResources-Acquire@Sites-Site
  • Depending on which data you replicated, the log has an entry for the start of the database copy. Check for errors.
  • After the replication, review the entire log for errors. If the process finishes successfully, the following message appears at the end of the logs. [2019-01-15 21:31:17.434 GMT] ReplicationPublication process finished with state 'StagingProcessCompleted'.

Fix a Hung Replication

Some database transactions, especially those involving catalog data, can take a while to complete. When data replication stays in the running state for longer than she expects, Linda checks to see if it’s hung. This means that the replication is no longer running, or has made limited or no progress. Linda must find out if it is hung and why, so she can run it successfully.

Check Staging

She checks the most recent replication log on the staging instance. Here's how.

  1. Confirm that it contains the line, "Staging pipeline in live system successfully called." If it doesn't, there’s a problem.
  2. Check to see if it includes an entry that a state is set to ErrorAcquiringEditingLocks. If so, resource locks from a previous replication process might not have been released, which can hang the replication.

Check the Target

She checks the most recent replication log on the target instance and scrolls to the end.

  1. Refresh the view a few times to see if new entries are being added. If no new entries appear after a while, the replication might be hung.
  2. Check if it includes an entry that a state is set to ErrorAcquiringLivelocs. If so, resource locks from a previous replication process might not have been released, which can hang the replication.
  3. If the last log entry is a database action, such as INSERT or ALTER INDEX, check previous logs to see how long the action took and what the next entry was.
  4. If the last log entry starts with Rsync, the delay might be due to a large number of changed static content files. Files that have been moved to a different folder are included, even if their content is the same. If the Rsync is stuck, contact Customer Support to check its status.
  5. If the log shows the state ErrorLiveStagingProcessKilled, the replication is probably hung due to a concurrent deployment or an instance restart.

Check Both

Sometimes, she works with both instances.

  1. If either log contains a line similar to resource busy and acquire with NOWAIT specified, open a ticket with Customer Support and provide the troubleshooting steps that you tried.
  2. If the replication process shows Completed on the target instance but its status is still waiting or in progress on the staging instance, the staging instance might have been down when the replication finished. Restart the staging instance and check the status again.
  3. If you determine that the replication is hung, use Control Center to restart the staging instance. Make sure the hung replication has stopped by verifying that its status on the staging instance is Failed. When it stops, rerun the replication.
  4. If the replication hangs again, try to restart the target and source instances and rerun the replication. Restarting the target instance disrupts all running jobs, returns errors for all storefront requests, and clears all caches. Restart a production instance only as a last resort.
  5. If the replication still hangs, open a Customer Support ticket and provide the troubleshooting steps that you tried.

Troubleshoot Cache Clearing

When Linda has issues with the page cache, she considers these tips.

  • Close the browser and clear local cache to ensure the problem isn’t local to your system before manually clearing the cache in Business Manager.
  • To manually clear the cache on the embedded CDN (production and development instances), click Invalidate for the Entire Page Cache for Site. You don’t need to clear static cache.
  • If you don’t see an expected change, look for a pattern that might indicate a more specific problem. For example, are images not refreshing? If so, the image provider could be having an issue. If a content asset is causing a problem, make sure that it was deployed.

Let’s Sum It Up

In this unit, Linda learned how to troubleshoot relocation problems. She learned how to undo a replication, review replication logs, handle a hung replication, and troubleshoot cache clearing.

This module showed you how to run and troubleshoot B2C Commerce replication processes. Now take this last quiz and earn your badge!

Keep learning for
free!
Sign up for an account to continue.
What’s in it for you?
  • Get personalized recommendations for your career goals
  • Practice your skills with hands-on challenges and quizzes
  • Track and share your progress with employers
  • Connect to mentorship and career opportunities