Monitor and Test

Learning Objectives

After completing this unit, you’ll be able to:

Describe two ways you can log and monitor application performance.
Explain two fault tolerance features that come in handy.
Explain why it’s important to define a realistic number of test cases to be executed by virtual test users.
List two ways you can ensure your test site is not accessible to shoppers or SEO crawlers.

Monitor and Test for Performance and Security

Vijay Lahiri, Cloud Kicks developer, wants to make sure his Cloud Kicks storefront can handle the added capability of his headless architecture and protect against malicious threats. The framework he chooses must alert on system metrics that give him enough time to react prior to service disruptions, whether the reaction is automated or manual.

Being able to monitor the health of his entire application stack in one centralized location is important. Without real-time and historical visibility into key application stack metrics, it’s hard to improve system performance or understand the root cause of service disruptions. It’s key that he defines his logging and monitoring framework requirements before he builds his custom applications so he doesn’t have to rewrite code later to include it.

Here’s what he wants to do.

Monitor the application stack
Configure for fault tolerance
Load test
Develop and monitor with security in mind
Perform load and penetration tests
Protect against malicious bots

In this unit, he learns about monitoring the application stack, fault tolerance, and load testing.

Monitor the Application Stack

The health of the application stack is critical. Logging within it can help Vijay troubleshoot problems with just the logs, if done correctly. The logging framework must be able to log each application component with various levels of detail, with as little overhead as possible.

Monitor the health of your application stack.

Vijay thinks of scenarios he might need to troubleshoot, then determines which metrics to use. For example, if the search functionality stops working as expected, he wants to see relevant log entries for a specific user session, and detailed log entries for lower-level debugging.

He needs to log these types of messages throughout his application stack, whether through custom logging or as a part of another stack component.

Correlation IDs: Unique IDs used to trace a transaction across all systems involved in the application stack
Access logs: Information about each request type, such as the HTTP request, method, user agent, and execution time
Application server logs: Types of logs, such as ERRORlevel logs, from each custom application component

Application Runtime SDKs

Most application runtime SDKs have built-in debugging capabilities that make it easy for Vijay to collect diagnostic data from within applications. This data enables him to troubleshoot problems after they’ve occurred instead of having to wait for the problem to reoccur so he can debug in real time.

Application Performance Monitoring (APM)

APM tools such as AppDynamics interrogate an application runtime environment and collect analytics data without a significant performance hit. This makes it easy to identify problem sources, such as:

An increase in error messages
A sustained increase in server-level metrics
Specific application function calls that perform worse than expected

Some APM tools can notify support and operations upon problem detection.

B2C Commerce

The B2C Commerce platform sends data to customer support and operations teams in the event of a problem. The system has a flexible logging and monitoring/alerting framework, and applications for visualizing system and application logs. With this data, customer support can quickly identify problems in an application before a shopper even notices something’s wrong.

Real User Monitoring (RUM)

While monitoring server-side metrics is important to ensure ideal user experience conditions, Vijay uses RUM to observe the user experience from a variety of browsers and devices on different speed networks, from a variety of locations. He embeds JavaScript on certain storefront pages that sends metrics to a RUM framework. The data helps him improve revenue-impacting metrics such as bounce rate, conversion rate, and SEO rankings.

Note: Contact your Salesforce customer success manager (CSM) to learn how RUM can benefit your organization.

Fault Tolerance

Fault tolerance is when a system keeps going despite the failure of one or more of its components. Fault tolerance features such as rate limiting and circuit breaking ensure the system doesn’t become overloaded during sudden bursts of traffic. Vijay looks to the npm repository for a fault tolerance package for NodeJS.

Rate limiting	Defines thresholds for the frequency a particular endpoint can be called by a single client or across all clients, and what happens if the thresholds are exceeded. For example, if a single client calls a search endpoint more than 100 times in 10 seconds, the feature blocks subsequent requests from that client for 10 minutes, or places the requests in a queue to be processed later.
Circuit breaking	Defines upper limits for how long requests to a certain endpoint remain active before a response is returned, and what happens if the threshold is exceeded. For example, if the page load time for logged-in shoppers depends on a call to a third-party loyalty service completing, the application waits up to 2 seconds for the call to complete before it closes the connection.

B2C Commerce APIs provide built-in rate limiting capabilities that define the maximum number of requests a tenant can issue to a given API per hour. If the number of requests is exceeded, the client receives a 429 response code, and the request is not executed.

Load Testing

Load testing is critical for the success of Vijay’s storefront, whether he’s running a custom head or not. Load testing gives him details on where and when his application might experience degraded performance or become unavailable under a heavy load. He can’t possibly think of every scenario that might cause poor performance, so he asks a partner to help.

Consider testing components with and without the custom head and with or without integrations.

Before testing the entire application, Vijay tests each component to better understand its performance characteristics, including the B2C Commerce APIs. He load-tests B2C Commerce integrations independent of his custom head to help determine if a performance problem relates to B2C Commerce or the custom head. He fine-tunes the application stack’s middleware components and frameworks as well, and monitors their health during load testing to remove bottlenecks.

Note: Don’t load test the Einstein Recommendations API, because the tests negatively impact recommendation results.

Third-Party Services

Some third-party service test endpoints don’t scale for production-level traffic. In some situations, testing those endpoints can incur costs. Make sure you understand the load these services can handle, and the response time service level agreement (SLA) provided by each endpoint.

Pre-Test

Defining a realistic breakdown of the test cases to be executed by virtual test users helps ensure you don’t stress endpoints beyond what they will handle in real-world scenarios. Forecasting the target load isn’t easy. We recommend you test 2x the load of your highest volume sale event from the previous year or to date with these metrics.

Maximum orders per hour
Maximum visits/sessions per hour
Page views per hour

Vijay contacts B2C Commerce customer support to make sure the test instance is provisioned with an appropriate initial size. This ensures a reasonable test environment and prevents sudden traffic increases from being flagged as potentially malicious. He also gets advanced load-testing support from client services, who monitor the platform during the tests and provide a findings summary.

Secure the Test Environment

Vijay accesses the test system with a custom domain name, such as loadtest.cloudkicks.com.

He uses these to guard against shoppers or SEO crawlers.

Allowlist IP ranges: Allows access to authorized IP ranges from which load is generated.
Password protect: Basic authentication to the application where shoppers must pass valid base64 encoded credentials in an authorization header.

Post-Launch Tests

Most merchants load test storefront applications prior to going live, and don’t test again despite making significant changes to the storefront design. Vijay plans to test frequently, especially after major functional changes. Here are some testing resources.

Vijay continually monitors high-level system metrics such as CPU and the memory utilization of each component to better understand how his custom head scales under load, and how much room there is for additional load.

Next Steps

You learned how you can log and monitor application performance, how fault tolerance features keep things running in the event of a failure, and the critical importance of load testing. Next you learn how to secure your storefront.