Diagnose Google Cloud Platform(GCP) issues with gcpdiag


What is gcpdiag?

gcpdiag is an open-source tool developed by Google Cloud Platform (GCP) teams to automate the detection and diagnosis of common misconfigurations and problems within your GCP projects. It leverages Google Cloud APIs to run a series of automated checks against your project and generates detailed reports highlighting any deviations from expected conditions.

It consists of two primary components:

  1. Lints:

    • Quick and concise checks to assess the overall health of your GCP project.
    • Identify potential issues and best practice violations.
  2. Runbooks:

    • In-depth diagnostic guides for specific problems.
    • Provide step-by-step instructions to resolve issues.
    • Offer tailored solutions based on your GCP environment.
    • By utilizing both lints and runbooks, you can effectively diagnose and resolve GCP issues, ensuring optimal performance and security.

Why Use gcpdiag?

  • Proactive Issue Detection: Identifies potential issues before they escalate into major problems.
  • Improved Performance: Pinpoints performance bottlenecks and optimization opportunities.
  • Reduced Downtime: Prevents outages by proactively addressing configuration errors.
  • Simplified Troubleshooting: Provides clear and actionable recommendations to fix identified issues.
  • Best Practice Adherence: Encourages adherence to GCP best practices and standards.

How to Use gcpdiag

Installation You can install and run gcpdiag using a shell wrapper that starts gcpdiag in a Docker container. This should work on any machine with Docker or Podman installed, including Cloud Shell.

curl https://gcpdiag.dev/gcpdiag.sh >gcpdiag
chmod +x gcpdiag
./gcpdiag lint --project=YOUR_PROJECT_ID

Prerequisites:

  • Google Cloud SDK: Ensure you have the Google Cloud SDK installed and configured with the necessary permissions.
  • Project ID: You'll need the ID of the GCP project you want to analyze.
  • Permissions: The credentials that you use with gcpdiag needs to have at minimum the following roles granted (both of them):
    • Viewer on the inspected project
    • Service Usage Consumer on the project used for billing/quota enforcement, which is per default the project being inspected, but can be explicitly set using the --billing-project option
  • Required APIs: gcpdiag requires some APIs to be enabled in order for the inspection of resources to work correctly:

    • cloudresourcemanager.googleapis.com (Cloud Resource Manager API)
    • iam.googleapis.com (Identity and Access Management API)
    • logging.googleapis.com (Cloud Logging API)
    • serviceusage.googleapis.com (Service Usage API)
    • monitoring.googleapis.com (Cloud Monitoring API)
    • Product API for which you are running gcpdiag diagnosis.

Running gcpdiag:

  • Cloud Shell:
  • gcpdiag is integrated into Google Cloud Shell. You can run it directly using:

    • Lints:

      gcpdiag lint --project=YOUR_PROJECT_ID

    • Runbooks:

      gcpdiag runbook gce/ssh -p project_id=YOUR_PROJECT_ID -p name=VM_NAME -p zone=VM_ZONE

    Note: For further help on command syntax, please see gcpdiag official website.

Key Features:

  • Rule-Based Checks: gcpdiag comes with a comprehensive set of rules to check for various issues, including:
    • Errors (ERR): Critical misconfigurations.
    • Warnings (WAR): Potential issues.
    • Best Practices (BP): Recommendations for improvement.
    • Security Issues (SEC): Potential security risks.
  • Detailed diagnosis via Runbooks: gcpdiag offers a powerful tool called "runbooks," which are automated diagnostic trees crafted by experienced GCP engineers. These runbooks can help you:

    • Pinpoint specific issues: Identify the root cause of problems within your GCP environment.
    • Guide troubleshooting: Provide step-by-step instructions to resolve the issue.
    • Offer tailored solutions: Suggest appropriate remedies based on your unique GCP setup.
  • Customizable Rules: You can create and add your own custom rules to tailor gcpdiag to your specific needs.

  • Detailed Reports: Generates detailed reports in JSON format, providing clear insights into the identified issues.
  • Actionable Recommendations: Offers specific guidance on how to address each issue.

Conclusion

By incorporating gcpdiag into your GCP workflow, you can significantly enhance the reliability, security, and performance of your infrastructure. It empowers you to proactively identify and resolve potential issues, ensuring a robust and efficient GCP environment.