Introduction
In this blog, we take a deep dive into analyzing server logs to identify unusual patterns, pinpoint potential security vulnerabilities, and understand the reasons behind server downtime. This analysis covers a wide range of insights, from GET requests behavior to traffic anomalies on specific dates. We also identify potential attacks on our server and provide mitigation strategies.
Background
Server logs are a vital source of information for understanding how users interact with an application. By analyzing these logs, we can identify patterns that help improve server performance, detect malicious activities, and ensure the system’s security. In this blog, we focus on traffic from November 1 to December 26, with specific emphasis on anomalous dates such as December 21 and December 23, when the server experienced downtime.
Dataset Preparation
The dataset used for this analysis comprises server logs containing various fields such as:
The logs were preprocessed and cleaned to remove any incomplete or corrupted entries, ensuring accurate insights.
Data Overview
- Total Records: 655,002 log entries
- Date Range: From 2024-11-02 to 2024-12-26 (covering 55 unique days)
- Fields in Dataset: 15 columns, including
ip_address
,timestamp
,method
,endpoint
,status_code
, etc. - Missing Values: None (All fields are fully populated)
- Most Common Field Types: Object and Integer
Unique Value Analysis**
- IP Addresses: 2,114 unique values (indicating a significant range of users/sources)
- Endpoints: 19,208 unique values (suggesting a large variety of API calls)
- Status Codes: 8 unique values (indicating a range of response statuses)
- Methods: 6 unique values (indicating different types of HTTP requests)
Methodology
The analysis was conducted in several phases:
- Traffic Analysis: Examining the volume of requests and error rates on different dates to identify high-traffic and anomalous periods.
- Crash Analysis: Investigating the reasons behind server downtime by analyzing response sizes, response times, and error distributions.
- GET Request Analysis: Analyzing GET requests, categorizing them by success and failure, and identifying the top IP addresses and countries sending these requests.
- Security Analysis: Identifying suspicious requests and potential attacks aimed at extracting sensitive information.
Analysis
Crash Analysis
The crash analysis focused on identifying patterns that led to server downtime. Two significant dates, December 21 and December 23, were identified as anomalous, with a complete drop in response size indicating a server crash.
Note: Our analysis initially hypothesized that errors were causing server crashes. However, after a thorough examination, we discovered that the server was already down before the errors occurred, indicating that traffic spikes and backend issues were the primary causes of the crashes. Here’s a detailed breakdown of our findings:
Crash Analysis: December 21 and December 23
We focused on two key dates with significant server issues: December 21 and December 23. Below is a summary of the errors observed on these dates:
Date | Hour | Status Code | Top Endpoint | Top IP Addresses | Error Count |
---|---|---|---|---|---|
2024-12-21 | 9 | 422 | / |
13.228.225.19, 18.142.128.26, 54.254.162.138 | 8 |
2024-12-21 | 9 | 499 | / |
13.228.225.19, 18.142.128.26, 54.254.162.138 | 193 |
2024-12-21 | 14 | 502 | /generate_stream |
13.228.225.19, 18.142.128.26, 54.254.162.138 | 136 |
2024-12-23 | 5 | 499 | / |
18.142.128.26 | 1 |
2024-12-23 | 7 | 502 | /generate_stream |
13.228.225.19, 18.142.128.26, 54.254.162.138 | 33 |
Error Types Observed
-
422 Errors (Unprocessable Entity):
These occurred on December 21 at hour 9, likely caused by malformed requests sent to the root endpoint (/
). -
499 Errors (Client Closed Request):
Observed on both December 21 and December 23, indicating that clients disconnected before receiving a server response. This could be due to timeouts or impatient bots. -
502 Errors (Bad Gateway):
These errors occurred primarily on December 21 at hour 14 and December 23 at hour 7, mainly on the/generate_stream
endpoint, suggesting backend server failures.
We suspect that on most server crash days, we will encounter errors such as [422, 499, 502], as these types of errors typically occur during server downtime.
Key Dates to Investigate Further:
Date | Event |
---|---|
2024-11-05 | Significant 502 error spike. |
2024-11-06 | Major 502 error spike (3024 errors). |
2024-11-26 | Noticeable spike in 499 and 422 errors. |
2024-12-03 | Server down event. |
2024-12-04 | Server down event. |
2024-12-07 | Server down event. |
2024-12-21 | Anomalous day with high 502, 499, and 422 errors. |
2024-12-23 | Anomalous day with 502 errors. |
Root Cause Hypothesis:
- Server misconfiguration.
- Upstream service failure.
- Backend failures (502 errors).
- Client-side issues (499 errors).
- Malformed requests (422 errors).
Traffic Analysis (December 19 to December 25)
Traffic logs when server was normal on December 20, December 24 and December 25
Traffic logs when server crashed on December 21, December 22 and December 23
What Might Have Caused the Server Crash?
Here are the most likely causes based on the available data:
Potential Cause | Description | Supporting Evidence |
---|---|---|
Resource Exhaustion | The server might have run out of resources (CPU, memory, disk) | High traffic spikes before the crash |
Backend Service Failure | A critical backend service might have crashed, causing the server to go down | 502 errors appearing after the crash |
Infrastructure Issue | There could have been a network issue or misconfiguration in the infrastructure | Sudden drop in response sizes to zero without prior errors |
External Attack (DDoS) | A possible Distributed Denial of Service (DDoS) attack could have overwhelmed the server | Unusual traffic patterns before the crash |
Key Observations:
- High Traffic Volume: The server experienced an unusually high volume of requests before the crash, particularly from specific IP addresses.
- Error Distribution: The predominant error codes on these dates were
499
(Client Closed Request),502
(Bad Gateway), and422
(Unprocessable Entity). - Response Time Patterns: Response times spiked significantly before the crashes, indicating potential bottlenecks or overload issues.
Root Cause:
The server crashes were primarily caused by an overload of incoming requests, from our Render IP addresses sending a large number of requests within a short period.
GET Request Analysis
GET requests are a fundamental part of interacting with an API. Our analysis focused on categorizing these requests by their success and failure rates, identifying top IP addresses, and understanding the endpoints being accessed.
GET Requests: Success (200) vs Fail (Non-200) Per Date
Overall Trend:
- Failure Ratio: Across the entire date range, the ratio of failed GET requests to successful GET requests is extremely high, suggesting either:
- Unauthorized or invalid GET requests.
- Potential abuse (e.g., DDoS attacks) targeting endpoints using GET requests.
- Misconfigured endpoints that are expected to handle POST requests but are receiving GET requests instead.
Possible Root Causes:
-
Invalid Use of GET Requests:
- Our system may be designed to handle POST requests, but clients are misusing GET requests, leading to failures.
-
Potential DDoS or Unauthorized Access Attempts:
- The spikes in failed GET requests on dates like December 18, November 7, and November 15 could indicate DDoS attacks or unauthorized access attempts.
-
Misconfigured Endpoints:
- Some endpoints might be incorrectly configured to reject GET requests, which are failing as a result.
Success vs. Failures:
- Successful GET Requests: Requests that returned a
200
status code, indicating that the request was processed correctly. - Failed GET Requests: Requests that returned non-
200
status codes, indicating an issue with the request.
Key Insights:
-
High Failure Rates Across All IPs:
- Most IP addresses, even among the top contributors, have extremely low success rates for GET requests, highlighting a potential issue with client configurations or unauthorized access attempts.
-
Top IP with Successes:
192.200.115.226 (United States)
stands out as the only IP with a relatively high number of successful requests (19).
-
Potential Bot Traffic:
- IPs with thousands of failed requests and zero successful requests (e.g.,
51.222.26.42
,38.54.76.179
) could indicate bot traffic or DDoS attempts.
- IPs with thousands of failed requests and zero successful requests (e.g.,
-
Misconfigured Clients or Abuse:
- Several IPs making repeated GET requests without success suggests client misuse or abuse of endpoints.
Top Countries Sending GET Requests:
The top countries sending GET requests, both successful and failed, include:
- Canada
- United Arab Emirates
- United States
- United Kingdom
- Singapore
The majority of the failed requests originated from Canada, followed by the UAE and the US. These requests often targeted sensitive endpoints, raising concerns about potential security risks.
Success Requests by User and Endpoint Breach
Success Requests by Country and Endpoint
- The United States had 83 successful requests, mainly to the
/
endpoint. - India shows frequent access, possibly from internal or test environments.
- Suspicious queries like SQL injection (
/?s=UNION+SELECT...
) and XSS attempts (/?a=<script>alert("XSS")</script>
) suggest security risks.
Success Requests by Date and Endpoint
- Most requests target the root endpoint (
/
), possibly from exploratory scans or default requests. - Notable suspicious activity on 2024-12-02, including SQL and XSS attack attempts.
- Other high-request dates: 2024-11-03, 2024-12-06, and 2024-12-15, warranting further investigation.
Failed Requests by User and Endpoint Breach
Out of 48K differerent GET request the highest failed requests originated from:
- Canada: 21,973 requests
- United Arab Emirates: 11,846 requests
- United States: 5,103 requests
- United Kingdom: 2,456 requests
This indicates that the bulk of non-200 responses came from specific countries, primarily Canada and UAE, which might need further investigation to understand the request patterns and their intent.
We observed that the failed requests targeted around 19,000 different endpoints, indicating widespread endpoint access attempts, some of which are suspicious and potentially malicious.
Security Analysis: IP Address 192.200.115.226
One particular IP address, 192.200.115.226
, stood out due to its repeated attempts to access sensitive data.
Suspicious Requests from 192.200.115.226:
This IP address made multiple requests to endpoints that indicate malicious intent, including:
-
2024-12-02: High activity with 12 requests to
/
, and a few requests to:/favicon.ico
/version
-
After that they start attacking on critical endpoints like:
- Cross-Site Scripting (XSS) Attempt:
/
endpoint with a query parameter attempting to inject a script.
- SQL Injection Attempt:
- Query strings containing SQL commands such as
UNION SELECT
andSLEEP
.
- Query strings containing SQL commands such as
- Directory Traversal Attempt:
- Attempting to access sensitive system files such as
/etc/passwd
.
- Attempting to access sensitive system files such as
Security Implications:
The repeated requests from this IP address suggest a deliberate attempt to probe the server for vulnerabilities. The combined use of XSS, SQL injection, and directory traversal techniques indicates a sophisticated attack aimed at compromising the server.
All the Endpoints which were Successfully Breached:
Endpoint | Count |
---|---|
/ |
40 |
/api-doc/openapi.json |
2 |
/docs/ |
2 |
/docs/favicon-32x32.png |
2 |
/docs/index.css |
2 |
/docs/swagger-initializer.js |
2 |
/docs/swagger-ui-bundle.js |
2 |
/docs/swagger-ui-standalone-preset.js |
2 |
/docs/swagger-ui.css |
2 |
/?a=%3Cscript%3Ealert%28%22XSS%22%29%3B%3C%2Fscript%3E&b=UNION+SELECT+ALL+FROM+information_schema+AND+%27+or+SLEEP%285%29+or+%27&c=..%2F..%2F..%2F..%2Fetc%2Fpasswd |
1 |
/?s=%3Cscript%3Ealert%28%22XSS%22%29%3B%3C%2Fscript%3E |
1 |
/?s=UNION+SELECT+ALL+FROM+information_schema+AND+%27+or+SLEEP%285%29+or+%27 |
1 |
Normal Requests Insights:
The majority of successful requests appear to be API documentation-related endpoints:
- Root Path (
/
) accessed 40 times. - API-related documentation endpoints such as:
/docs/
/api-doc/openapi.json
- Swagger UI files like:
/docs/swagger-ui.css
/docs/swagger-initializer.js
These are standard requests when developers or users explore API documentation. However, API documentation endpoints being exposed publicly is a security risk unless protected behind authentication.
Recommended Action:
- Ensure that Swagger or OpenAPI documentation is only accessible to authorized users to reduce the risk of attackers discovering API endpoints.
Suspicious Requests Insights:
There are three major suspicious requests that stand out as clear attack attempts.
Cross-Site Scripting (XSS) Attempt:
/?s=%3Cscript%3Ealert%28%22XSS%22%29%3B%3C%2Fscript%3E
- Description:
This is a typical XSS probe, where an attacker is attempting to inject a malicious JavaScript script that would display an alert. - Risk:
If our application is not properly sanitizing user input, it could be vulnerable to XSS attacks, allowing attackers to execute malicious scripts in users’ browsers.
Actionable Insight:
- Ensure proper input sanitization and encoding to prevent XSS attacks.
- Use libraries or frameworks that have built-in protections against XSS (e.g., Django’s
SafeString
).
SQL Injection Attempt:
/?s=UNION+SELECT+ALL+FROM+information_schema+AND+%27+or+SLEEP%285%29+or+%27
- Description:
This request is a SQL injection attack. The attacker is trying to manipulate the SQL queries our server sends to the database. - Notable Techniques Used:
UNION SELECT
to attempt to extract data from the database.SLEEP(5)
to introduce delays in responses, confirming the presence of SQL injection vulnerabilities.
Actionable Insight:
- Use parameterized queries in our database interactions to prevent SQL injection.
- Implement input validation to ensure that only valid inputs are processed.
Combined Attack Attempt:
/?a=%3Cscript%3Ealert%28%22XSS%22%29%3B%3C%2Fscript%3E&b=UNION+SELECT+ALL+FROM+information_schema+AND+%27+or+SLEEP%285%29+or+%27&c=..%2F..%2F..%2F..%2Fetc%2Fpasswd
This request is particularly concerning as it combines multiple attack vectors:
- XSS Attempt: Parameter
a
contains an XSS payload. - SQL Injection Attempt: Parameter
b
contains an SQL injection payload. - Directory Traversal Attempt: Parameter
c
is trying to access the/etc/passwd
file, which contains system user information on Unix-based systems.
Actionable Insight:
- Secure our application against directory traversal attacks. Ensure that user inputs are sanitized and any path manipulations are restricted.
- Review our server configurations to prevent sensitive files from being accessible via the web.
Recommendations to Mitigate GET Request Failures and Attacks
-
Block Malicious IP Addresses:
- Implement a firewall rule to block 192.200.115.226 and other suspicious IPs showing similar behavior.
-
Rate-Limiting and CAPTCHA:
- Apply rate-limiting for high-volume requests.
- Use CAPTCHA for endpoints prone to attack (like
/
).
-
API Documentation Protection:
- Restrict access to
/api-doc/openapi.json
and other documentation endpoints. - Ensure API documentation is authenticated and protected from public access.
- Restrict access to
-
Input Validation/Sanitization:
- Apply strict input validation to prevent XSS, SQL injection, and directory traversal attacks.
-
Implement Web Application Firewalls (WAFs):
- To block malicious requests before they reach the server.
Discussion
The analysis highlights the importance of continuously monitoring server logs to identify potential security threats. The identified patterns of server crashes and suspicious requests underscore the need for robust security measures to protect the server from malicious activities.
The GET request analysis revealed that most requests are legitimate; however, a few IP addresses stand out due to their suspicious behavior. These insights can be used to strengthen the server’s security posture and prevent future incidents.
Conclusion
Analyzing server logs provides valuable insights into server performance, user behavior, and security threats. By identifying traffic patterns, crash causes, and potential attacks, we can take proactive steps to improve server stability and security.
The analysis uncovered several key findings:
- Server crashes on December 21 and December 23 were caused by a high volume of incoming requests.
- Suspicious requests from specific IP addresses indicate potential security threats.
- Implementing security measures such as WAFs, rate limiting, and input sanitization can help prevent future attacks.
By continuously monitoring and analyzing server logs, we can ensure the server remains secure, stable, and performant.