Monitor and Test
After completing this unit, you’ll be able to:
- Describe two ways you can log and monitor application performance.
- Explain two fault tolerance features that come in handy.
- Explain why it’s important to define a realistic number of test cases to be executed by virtual test users.
- List two ways you can ensure your test site is not accessible to shoppers or SEO crawlers.
Monitor and Test for Performance and Security
Vijay Lahiri, Cloud Kicks developer, wants to make sure his Cloud Kicks storefront can handle the added capability of his headless architecture and protect against malicious threats. The framework he chooses must alert on system metrics that give him enough time to react prior to service disruptions, whether the reaction is automated or manual.
Being able to monitor the health of his entire application stack in one centralized location is important. Without real-time and historical visibility into key application stack metrics, it’s hard to improve system performance or understand the root cause of service disruptions. It’s key that he defines his logging and monitoring framework requirements before he builds his custom applications so he doesn’t have to rewrite code later to include it.
Here’s what he wants to do.
- Monitor the application stack
- Configure for fault tolerance
- Load test
- Develop and monitor with security in mind
- Perform load and penetration tests
- Protect against malicious bots
In this unit, he learns about monitoring the application stack, fault tolerance, and load testing.
Monitor the Application Stack
The health of the application stack is critical. Logging within it can help Vijay troubleshoot problems with just the logs, if done correctly. The logging framework must be able to log each application component with various levels of detail, with as little overhead as possible.
Vijay thinks of scenarios he might need to troubleshoot, then determines which metrics to use. For example, if the search functionality stops working as expected, he wants to see relevant log entries for a specific user session, and detailed log entries for lower-level debugging.
He needs to log these types of messages throughout his application stack, whether through custom logging or as a part of another stack component.
- Correlation IDs: Unique IDs used to trace a transaction across all systems involved in the application stack
- Access logs: Information about each request type, such as the HTTP request, method, user agent, and execution time
Application server logs: Types of logs, such as
ERRORlevellogs, from each custom application component
Application Runtime SDKs
Most application runtime SDKs have built-in debugging capabilities that make it easy for Vijay to collect diagnostic data from within applications. This data enables him to troubleshoot problems after they’ve occurred instead of having to wait for the problem to reoccur so he can debug in real time.
Application Performance Monitoring (APM)
APM tools such as AppDynamics interrogate an application runtime environment and collect analytics data without a significant performance hit. This makes it easy to identify problem sources, such as:
- An increase in error messages
- A sustained increase in server-level metrics
- Specific application function calls that perform worse than expected
Some APM tools can notify support and operations upon problem detection.
The B2C Commerce platform sends data to customer support and operations teams in the event of a problem. The system has a flexible logging and monitoring/alerting framework, and applications for visualizing system and application logs. With this data, customer support can quickly identify problems in an application before a shopper even notices something’s wrong.
Real User Monitoring (RUM)
Note: Contact your Salesforce customer success manager (CSM) to learn how RUM can benefit your organization.
Fault tolerance is when a system keeps going despite the failure of one or more of its components. Fault tolerance features such as rate limiting and circuit breaking ensure the system doesn’t become overloaded during sudden bursts of traffic. Vijay looks to the npm repository for a fault tolerance package for NodeJS.
||Defines thresholds for the frequency a particular endpoint can be called by a single client or across all clients, and what happens if the thresholds are exceeded. For example, if a single client calls a search endpoint more than 100 times in 10 seconds, the feature blocks subsequent requests from that client for 10 minutes, or places the requests in a queue to be processed later.
||Defines upper limits for how long requests to a certain endpoint remain active before a response is returned, and what happens if the threshold is exceeded. For example, if the page load time for logged-in shoppers depends on a call to a third-party loyalty service completing, the application waits up to 2 seconds for the call to complete before it closes the connection.
B2C Commerce APIs provide built-in rate limiting capabilities that define the maximum number of requests a tenant can issue to a given API per hour. If the number of requests is exceeded, the client receives a 429 response code, and the request is not executed.
Load testing is critical for the success of Vijay’s storefront, whether he’s running a custom head or not. Load testing gives him details on where and when his application might experience degraded performance or become unavailable under a heavy load. He can’t possibly think of every scenario that might cause poor performance, so he asks a partner to help.
Before testing the entire application, Vijay tests each component to better understand its performance characteristics, including the B2C Commerce APIs. He load-tests B2C Commerce integrations independent of his custom head to help determine if a performance problem relates to B2C Commerce or the custom head. He fine-tunes the application stack’s middleware components and frameworks as well, and monitors their health during load testing to remove bottlenecks.
Note: Don’t load test the Einstein Recommendations API, because the tests negatively impact recommendation results.
Some third-party service test endpoints don’t scale for production-level traffic. In some situations, testing those endpoints can incur costs. Make sure you understand the load these services can handle, and the response time service level agreement (SLA) provided by each endpoint.
Defining a realistic breakdown of the test cases to be executed by virtual test users helps ensure you don’t stress endpoints beyond what they will handle in real-world scenarios. Forecasting the target load isn’t easy. We recommend you test 2x the load of your highest volume sale event from the previous year or to date with these metrics.
- Maximum orders per hour
- Maximum visits/sessions per hour
- Page views per hour
Vijay contacts B2C Commerce customer support to make sure the test instance is provisioned with an appropriate initial size. This ensures a reasonable test environment and prevents sudden traffic increases from being flagged as potentially malicious. He also gets advanced load-testing support from client services, who monitor the platform during the tests and provide a findings summary.
Secure the Test Environment
Vijay accesses the test system with a custom domain name, such as loadtest.cloudkicks.com.
He uses these to guard against shoppers or SEO crawlers.
- Allowlist IP ranges: Allows access to authorized IP ranges from which load is generated.
- Password protect: Basic authentication to the application where shoppers must pass valid base64 encoded credentials in an authorization header.
Most merchants load test storefront applications prior to going live, and don’t test again despite making significant changes to the storefront design. Vijay plans to test frequently, especially after major functional changes. Here are some testing resources.
Vijay continually monitors high-level system metrics such as CPU and the memory utilization of each component to better understand how his custom head scales under load, and how much room there is for additional load.
You learned how you can log and monitor application performance, how fault tolerance features keep things running in the event of a failure, and the critical importance of load testing. Next you learn how to secure your storefront.
- External Link: npm repository
- External Link: Prometheus Alerting Best Practices
- External Link: Grafana
- External Link: AppDynamics Application Performance Management (APM)
- External Link: Assign Memory Resources to Containers and Pods
- External Link: Configure Default Memory Requests and Limits for a Namespace
- External Link: NodeJS Express
- External Link: Optimize Performance