My team has a web server in a Docker image. There is an ENV variable for the Postgres URL address. Locally, this has the form: “postgres://$USER:$PASSWORD@localhost:5432/$DB_NAME”. When this server starts up, it uses that ENV variable and connects to Postgres. We tried to deploy this to GCP. GCP CloudRun is the stop-gap between Functions ( one HTTP request handler, rather than an app with many ) and GKE ( multiple containers at once). We’ve used GKE before but we have simpler architecture in this case and CloudRun is, in theory, intended for this use case where you just have a single container. There’s AppEngine but CloudRun appears to be more barebones. We thought we’d use CloudSQL for managed Postgres. You fill out the webform, it creates a machine running Postgres that has some kind of URL address. From the above local example, there’s really just two values to consider: the port ( probably the default 5432 ) and the host (which in this case is the IP, rather than “localhost”). On the CloudRun side, we upload our Docker image and specify the environment variable for the Postgres address. Turns out this exceedingly difficult. It’s not obvious at all how to establish this simple connection. After drudging through a dozen form questions, half a dozen rabbit-holes of gibberish docs, and side-questing to enable half a dozen APIs, it seems your CloudRun app can’t actually talk to your CloudSQL. You have to go create a default VPC because it is not created by default. Then apparently you have to go create a “VPC connector”. After half a dozen attempt at creating these networking resources, you finally succeed. You back to attempt provisioning CloudRun but discover you can’t because the VPC connector is not in the same region as the CloudRun instance. So you backtrack once more. After doing all of this, the CloudRun “service” / Docker image fails to start. We seek clues whether we should investigate that we’ve incorrectly specified the Postgres address or whether our network config insufficiently provided visibility. We’re told something went wrong with the container and there will be clues in logs. We open up the logs viewer and the only errors are from GCP’s implementation detail and Daemon. “terminated: Application failed to start: Failed to create init process: failed to load /usr/local/bin/docker-entrypoint.sh: exec format error” rather than the one from our application log. Now CloudRun is in an infinite loop / pending state, so we’ll just have to delete it and try all over again. How are people using CloudRun in production if you can’t tell what made the container fail?
Story Published at: September 4, 2022 at 03:23PM
Story Published at: September 4, 2022 at 03:23PM