Comprehensive Analysis of HuggingFace LLM Server Logs and Traffic Patterns

Introduction

In this blog, we take a deep dive into analyzing server logs to identify unusual patterns, pinpoint potential security vulnerabilities, and understand the reasons behind server downtime. This analysis covers a wide range of insights, from GET requests behavior to traffic anomalies on specific dates. We also identify potential attacks on our server and provide mitigation strategies.

Background

Server logs are a vital source of information for understanding how users interact with an application. By analyzing these logs, we can identify patterns that help improve server performance, detect malicious activities, and ensure the system’s security. In this blog, we focus on traffic from November 1 to December 26, with specific emphasis on anomalous dates such as December 21 and December 23, when the server experienced downtime.

Dataset Preparation

The dataset used for this analysis comprises server logs containing various fields such as:

The logs were preprocessed and cleaned to remove any incomplete or corrupted entries, ensuring accurate insights.

:card_file_box: Data Overview

  • Total Records: 655,002 log entries
  • Date Range: From 2024-11-02 to 2024-12-26 (covering 55 unique days)
  • Fields in Dataset: 15 columns, including ip_address, timestamp, method, endpoint, status_code, etc.
  • Missing Values: None (All fields are fully populated)
  • Most Common Field Types: Object and Integer

Unique Value Analysis**

  • IP Addresses: 2,114 unique values (indicating a significant range of users/sources)
  • Endpoints: 19,208 unique values (suggesting a large variety of API calls)
  • Status Codes: 8 unique values (indicating a range of response statuses)
  • Methods: 6 unique values (indicating different types of HTTP requests)

Methodology

The analysis was conducted in several phases:

  1. Traffic Analysis: Examining the volume of requests and error rates on different dates to identify high-traffic and anomalous periods.
  2. Crash Analysis: Investigating the reasons behind server downtime by analyzing response sizes, response times, and error distributions.
  3. GET Request Analysis: Analyzing GET requests, categorizing them by success and failure, and identifying the top IP addresses and countries sending these requests.
  4. Security Analysis: Identifying suspicious requests and potential attacks aimed at extracting sensitive information.

Analysis

Crash Analysis

The crash analysis focused on identifying patterns that led to server downtime. Two significant dates, December 21 and December 23, were identified as anomalous, with a complete drop in response size indicating a server crash.

Note: Our analysis initially hypothesized that errors were causing server crashes. However, after a thorough examination, we discovered that the server was already down before the errors occurred, indicating that traffic spikes and backend issues were the primary causes of the crashes. Here’s a detailed breakdown of our findings:

:bar_chart: Crash Analysis: December 21 and December 23

We focused on two key dates with significant server issues: December 21 and December 23. Below is a summary of the errors observed on these dates:

Date Hour Status Code Top Endpoint Top IP Addresses Error Count
2024-12-21 9 422 / 13.228.225.19, 18.142.128.26, 54.254.162.138 8
2024-12-21 9 499 / 13.228.225.19, 18.142.128.26, 54.254.162.138 193
2024-12-21 14 502 /generate_stream 13.228.225.19, 18.142.128.26, 54.254.162.138 136
2024-12-23 5 499 / 18.142.128.26 1
2024-12-23 7 502 /generate_stream 13.228.225.19, 18.142.128.26, 54.254.162.138 33

:mag_right: Error Types Observed

  • :one: 422 Errors (Unprocessable Entity):
    These occurred on December 21 at hour 9, likely caused by malformed requests sent to the root endpoint (/).

  • :two: 499 Errors (Client Closed Request):
    Observed on both December 21 and December 23, indicating that clients disconnected before receiving a server response. This could be due to timeouts or impatient bots.

  • :three: 502 Errors (Bad Gateway):
    These errors occurred primarily on December 21 at hour 14 and December 23 at hour 7, mainly on the /generate_stream endpoint, suggesting backend server failures.

We suspect that on most server crash days, we will encounter errors such as [422, 499, 502], as these types of errors typically occur during server downtime.

:date: Key Dates to Investigate Further:

Date Event
2024-11-05 Significant 502 error spike.
2024-11-06 Major 502 error spike (3024 errors).
2024-11-26 Noticeable spike in 499 and 422 errors.
2024-12-03 Server down event.
2024-12-04 Server down event.
2024-12-07 Server down event.
2024-12-21 Anomalous day with high 502, 499, and 422 errors.
2024-12-23 Anomalous day with 502 errors.

Root Cause Hypothesis:

  • Server misconfiguration.
  • Upstream service failure.
  • Backend failures (502 errors).
  • Client-side issues (499 errors).
  • Malformed requests (422 errors).

Traffic Analysis (December 19 to December 25)

Traffic logs when server was normal on December 20, December 24 and December 25

Traffic logs when server crashed on December 21, December 22 and December 23

:jigsaw: What Might Have Caused the Server Crash?

Here are the most likely causes based on the available data:

Potential Cause Description Supporting Evidence
Resource Exhaustion The server might have run out of resources (CPU, memory, disk) High traffic spikes before the crash
Backend Service Failure A critical backend service might have crashed, causing the server to go down 502 errors appearing after the crash
Infrastructure Issue There could have been a network issue or misconfiguration in the infrastructure Sudden drop in response sizes to zero without prior errors
External Attack (DDoS) A possible Distributed Denial of Service (DDoS) attack could have overwhelmed the server Unusual traffic patterns before the crash

Key Observations:

  1. High Traffic Volume: The server experienced an unusually high volume of requests before the crash, particularly from specific IP addresses.
  2. Error Distribution: The predominant error codes on these dates were 499 (Client Closed Request), 502 (Bad Gateway), and 422 (Unprocessable Entity).
  3. Response Time Patterns: Response times spiked significantly before the crashes, indicating potential bottlenecks or overload issues.

Root Cause:

The server crashes were primarily caused by an overload of incoming requests, from our Render IP addresses sending a large number of requests within a short period.


GET Request Analysis

GET requests are a fundamental part of interacting with an API. Our analysis focused on categorizing these requests by their success and failure rates, identifying top IP addresses, and understanding the endpoints being accessed.

GET Requests: Success (200) vs Fail (Non-200) Per Date

:chart_with_downwards_trend: Overall Trend:

  • Failure Ratio: Across the entire date range, the ratio of failed GET requests to successful GET requests is extremely high, suggesting either:
    1. Unauthorized or invalid GET requests.
    2. Potential abuse (e.g., DDoS attacks) targeting endpoints using GET requests.
    3. Misconfigured endpoints that are expected to handle POST requests but are receiving GET requests instead.

:triangular_flag_on_post: Possible Root Causes:

  1. Invalid Use of GET Requests:

    • Our system may be designed to handle POST requests, but clients are misusing GET requests, leading to failures.
  2. Potential DDoS or Unauthorized Access Attempts:

    • The spikes in failed GET requests on dates like December 18, November 7, and November 15 could indicate DDoS attacks or unauthorized access attempts.
  3. Misconfigured Endpoints:

    • Some endpoints might be incorrectly configured to reject GET requests, which are failing as a result.

Success vs. Failures:

  • Successful GET Requests: Requests that returned a 200 status code, indicating that the request was processed correctly.
  • Failed GET Requests: Requests that returned non-200 status codes, indicating an issue with the request.

:chart_with_upwards_trend: Key Insights:

  1. High Failure Rates Across All IPs:

    • Most IP addresses, even among the top contributors, have extremely low success rates for GET requests, highlighting a potential issue with client configurations or unauthorized access attempts.
  2. Top IP with Successes:

    • 192.200.115.226 (United States) stands out as the only IP with a relatively high number of successful requests (19).
  3. Potential Bot Traffic:

    • IPs with thousands of failed requests and zero successful requests (e.g., 51.222.26.42, 38.54.76.179) could indicate bot traffic or DDoS attempts.
  4. Misconfigured Clients or Abuse:

    • Several IPs making repeated GET requests without success suggests client misuse or abuse of endpoints.

Top Countries Sending GET Requests:

The top countries sending GET requests, both successful and failed, include:

  1. Canada
  2. United Arab Emirates
  3. United States
  4. United Kingdom
  5. Singapore

The majority of the failed requests originated from Canada, followed by the UAE and the US. These requests often targeted sensitive endpoints, raising concerns about potential security risks.

Success Requests by User and Endpoint Breach

Success Requests by Country and Endpoint

  • The United States had 83 successful requests, mainly to the / endpoint.
  • India shows frequent access, possibly from internal or test environments.
  • Suspicious queries like SQL injection (/?s=UNION+SELECT...) and XSS attempts (/?a=<script>alert("XSS")</script>) suggest security risks.

Success Requests by Date and Endpoint

  • Most requests target the root endpoint (/), possibly from exploratory scans or default requests.
  • Notable suspicious activity on 2024-12-02, including SQL and XSS attack attempts.
  • Other high-request dates: 2024-11-03, 2024-12-06, and 2024-12-15, warranting further investigation.

Failed Requests by User and Endpoint Breach

Out of 48K differerent GET request the highest failed requests originated from:

  • Canada: 21,973 requests
  • United Arab Emirates: 11,846 requests
  • United States: 5,103 requests
  • United Kingdom: 2,456 requests

This indicates that the bulk of non-200 responses came from specific countries, primarily Canada and UAE, which might need further investigation to understand the request patterns and their intent.

We observed that the failed requests targeted around 19,000 different endpoints, indicating widespread endpoint access attempts, some of which are suspicious and potentially malicious.

Security Analysis: IP Address 192.200.115.226

One particular IP address, 192.200.115.226, stood out due to its repeated attempts to access sensitive data.

Suspicious Requests from 192.200.115.226:

This IP address made multiple requests to endpoints that indicate malicious intent, including:

  • 2024-12-02: High activity with 12 requests to /, and a few requests to:

    • /favicon.ico
    • /version
  • After that they start attacking on critical endpoints like:

  1. Cross-Site Scripting (XSS) Attempt:
    • / endpoint with a query parameter attempting to inject a script.
  2. SQL Injection Attempt:
    • Query strings containing SQL commands such as UNION SELECT and SLEEP.
  3. Directory Traversal Attempt:
    • Attempting to access sensitive system files such as /etc/passwd.

Security Implications:

The repeated requests from this IP address suggest a deliberate attempt to probe the server for vulnerabilities. The combined use of XSS, SQL injection, and directory traversal techniques indicates a sophisticated attack aimed at compromising the server.

All the Endpoints which were Successfully Breached:

Endpoint Count
/ 40
/api-doc/openapi.json 2
/docs/ 2
/docs/favicon-32x32.png 2
/docs/index.css 2
/docs/swagger-initializer.js 2
/docs/swagger-ui-bundle.js 2
/docs/swagger-ui-standalone-preset.js 2
/docs/swagger-ui.css 2
/?a=%3Cscript%3Ealert%28%22XSS%22%29%3B%3C%2Fscript%3E&b=UNION+SELECT+ALL+FROM+information_schema+AND+%27+or+SLEEP%285%29+or+%27&c=..%2F..%2F..%2F..%2Fetc%2Fpasswd 1
/?s=%3Cscript%3Ealert%28%22XSS%22%29%3B%3C%2Fscript%3E 1
/?s=UNION+SELECT+ALL+FROM+information_schema+AND+%27+or+SLEEP%285%29+or+%27 1

:white_check_mark: Normal Requests Insights:

The majority of successful requests appear to be API documentation-related endpoints:

  • Root Path (/) accessed 40 times.
  • API-related documentation endpoints such as:
    • /docs/
    • /api-doc/openapi.json
    • Swagger UI files like:
      • /docs/swagger-ui.css
      • /docs/swagger-initializer.js

These are standard requests when developers or users explore API documentation. However, API documentation endpoints being exposed publicly is a security risk unless protected behind authentication.

Recommended Action:

  • Ensure that Swagger or OpenAPI documentation is only accessible to authorized users to reduce the risk of attackers discovering API endpoints.

:warning: Suspicious Requests Insights:

There are three major suspicious requests that stand out as clear attack attempts.

:one: Cross-Site Scripting (XSS) Attempt:

/?s=%3Cscript%3Ealert%28%22XSS%22%29%3B%3C%2Fscript%3E
  • Description:
    This is a typical XSS probe, where an attacker is attempting to inject a malicious JavaScript script that would display an alert.
  • Risk:
    If our application is not properly sanitizing user input, it could be vulnerable to XSS attacks, allowing attackers to execute malicious scripts in users’ browsers.

:red_circle: Actionable Insight:

  • Ensure proper input sanitization and encoding to prevent XSS attacks.
  • Use libraries or frameworks that have built-in protections against XSS (e.g., Django’s SafeString).

:two: SQL Injection Attempt:

/?s=UNION+SELECT+ALL+FROM+information_schema+AND+%27+or+SLEEP%285%29+or+%27
  • Description:
    This request is a SQL injection attack. The attacker is trying to manipulate the SQL queries our server sends to the database.
  • Notable Techniques Used:
    • UNION SELECT to attempt to extract data from the database.
    • SLEEP(5) to introduce delays in responses, confirming the presence of SQL injection vulnerabilities.

:red_circle: Actionable Insight:

  • Use parameterized queries in our database interactions to prevent SQL injection.
  • Implement input validation to ensure that only valid inputs are processed.

:three: Combined Attack Attempt:

/?a=%3Cscript%3Ealert%28%22XSS%22%29%3B%3C%2Fscript%3E&b=UNION+SELECT+ALL+FROM+information_schema+AND+%27+or+SLEEP%285%29+or+%27&c=..%2F..%2F..%2F..%2Fetc%2Fpasswd

This request is particularly concerning as it combines multiple attack vectors:

  • XSS Attempt: Parameter a contains an XSS payload.
  • SQL Injection Attempt: Parameter b contains an SQL injection payload.
  • Directory Traversal Attempt: Parameter c is trying to access the /etc/passwd file, which contains system user information on Unix-based systems.

:red_circle: Actionable Insight:

  • Secure our application against directory traversal attacks. Ensure that user inputs are sanitized and any path manipulations are restricted.
  • Review our server configurations to prevent sensitive files from being accessible via the web.

:rotating_light: Recommendations to Mitigate GET Request Failures and Attacks

  1. Block Malicious IP Addresses:

    • Implement a firewall rule to block 192.200.115.226 and other suspicious IPs showing similar behavior.
  2. Rate-Limiting and CAPTCHA:

    • Apply rate-limiting for high-volume requests.
    • Use CAPTCHA for endpoints prone to attack (like /).
  3. API Documentation Protection:

    • Restrict access to /api-doc/openapi.json and other documentation endpoints.
    • Ensure API documentation is authenticated and protected from public access.
  4. Input Validation/Sanitization:

    • Apply strict input validation to prevent XSS, SQL injection, and directory traversal attacks.
  5. Implement Web Application Firewalls (WAFs):

    • To block malicious requests before they reach the server.

Discussion

The analysis highlights the importance of continuously monitoring server logs to identify potential security threats. The identified patterns of server crashes and suspicious requests underscore the need for robust security measures to protect the server from malicious activities.

The GET request analysis revealed that most requests are legitimate; however, a few IP addresses stand out due to their suspicious behavior. These insights can be used to strengthen the server’s security posture and prevent future incidents.

Conclusion

Analyzing server logs provides valuable insights into server performance, user behavior, and security threats. By identifying traffic patterns, crash causes, and potential attacks, we can take proactive steps to improve server stability and security.

The analysis uncovered several key findings:

  • Server crashes on December 21 and December 23 were caused by a high volume of incoming requests.
  • Suspicious requests from specific IP addresses indicate potential security threats.
  • Implementing security measures such as WAFs, rate limiting, and input sanitization can help prevent future attacks.

By continuously monitoring and analyzing server logs, we can ensure the server remains secure, stable, and performant.

Response to Suspicious Requests Insights

Thank you for the detailed analysis and insights regarding the suspicious requests. Upon reviewing the information, we have determined that these attack attempts can be safely ignored for the following reasons:

  1. No Significant Impact:

    • The attacks are targeting aspects that are not relevant or critical to our current setup.
    • All responses to these requests do not expose any sensitive information.
  2. Non-Existent Vulnerabilities:

    • SQL injection attempts are not a concern as the targeted endpoint does not interact with a database.
    • The rate of attack is minimal and does not qualify as a Denial-of-Service (DoS) attack.
  3. Existing Protections:

    • Our server has sufficient safeguards in place to handle these types of requests without compromising performance or security.

That said, based on this data, we will proactively review and enhance the security of our APIs and server configurations to prevent potential threats in the future.

Thank you for bringing this to attention

1 Like