Other: HVR direct download
Answeredthe use of Amazon S3's pre-signed URLs for providing access to files within your services -- while this feature offers significant benefits in terms of secure and time-limited access to resources, it does hinder those who choose to automate the installation process of the HVR hub and agent.
This is based on my assumption that the files we obtain from the webUI are simply frequently-accessed, non-unique files, and those files do not contain sensitive information. If I am mistaken, please forgive me, and if I may - I have feedback for such a scenario:
If the .tar.gz is unique to us, and contains sensitive details, licensing information, certs (I think these are actually the same for everyone who uses HVR, by default? Or did that change?), APIkeys, etc., then I think an API would be more ideal. For instance, you issue me an APItoken, I use it to talk to your download service, it determines I’m me, legitimate and authorized, and HTTP 200’s me the proper .tar.gz file. This would allow "repeated" downloads of any amount - i.e., if we have Packer making an AMI and it keeps failing on some random command, we need to be able to download the .tar.gz maybe 3 times in a 30 minute window. It is up to the administrator/DevOps team on the other end of this ticket to work out handling API abuse, which I do concede is a concern.
However. If the .tar.gz is indeed general-purpose, I would like to submit my feedback that it should be made generally accessible for users who automate. Here is why.
As it currently stands, the use of pre-signed URLs introduces an additional layer of complexity for users, particularly those who wish to automate the download of the files. The query parameters appended to the URLs (e.g., X-Amz-Algorithm, X-Amz-Expires) are necessary for S3 to validate the request, but they also mean that the URL changes over time as the signature and expiration date change.
For users who are automating their download processes, they would need to retrieve a valid, non-expired URL each time they wish to download the file. This can create challenges in the automation process and introduces potential points of failure, as any issues in retrieving or using the updated URL can lead to unsuccessful downloads.
In addition, the expiration of the URLs could lead to access disruptions if a URL is not used within the specified timeframe. While the URLs can be regenerated, this may not be an instantaneous process and can lead to potential gaps in access. Perhaps a build needs to be reran with an old version that was issued a URL 2 years ago – with those S3 validations, it’s not happening.
Lastly, though it may not be significant, there are cost implications to consider. Each request, including generating a new pre-signed URL, incurs a small cost. Over time and with high traffic, this could lead to increased costs.
Considering these factors, I propose that Fivetran take a look at the existing approach to providing access to these specific files. If the files do not contain sensitive data and are intended for wide distribution, Fivetran may explore more straightforward access control strategies:
One potential solution is using Amazon CloudFront in front of your S3 bucket. CloudFront can cache your content at edge locations close to your users, reducing the load on your S3 bucket and potentially lowering your costs. It doesn't directly provide a way to limit download speeds or data transfer rates, but it's a happy medium.
Another cost-effective approach might be to use a 3rd-party reverse proxy that supports rate limiting, such as Nginx or HAProxy. These can be configured to limit the download rate for certain files or paths. This would involve running these services on an EC2 instance, which would add complexity and cost, but could better serve your users. You can use an S3 VPC Endpoint to help reduce cost and complexity; in fact, you could use s3fs to mount the bucket local to the nginx box.
Finally, removing the signing entirely – assuming the files are general purpose and we aren’t worried about bandwidth spend – would work, too.
Whether the file is unique to our account or not, I still believe that adopting a simpler access strategy would help more than myself alone. I would love to find a way to get the file downloaded one time, every time, all the time - with the exact same URL. Then, all I need to worry about is the version number... Which is what I was going to ask about next. :) I'd also like to scrape you to find out if there's a new release version available so that my automation can continuously keep me up to date as I build new machine images with HVR baked in.
Thank you for considering this perspective!
-
+1 this is standard for many vendors and would be quite helpful to us.
-
This is something that we need too. it's almost impossible for us to get the latest version of HVR in an automated manner with the current situation.
-
Industry standard for many vendors.
-
Confirmed with HVR support that these downloads are not user-specific; the license key is separate. So these are standard downloads that could, if implemented, be enabled for DDL and we could wget/curl to our hearts' content. :)
-
We have an endpoint api.fivetran.com/download/hvr. However it looks like it is not documented. Let me get that added.
Thanks,
Mark.
Please sign in to leave a comment.
Comments
5 comments