Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vetted datasets in File Upload Wizard available on the public FTP server #167

Open
rija opened this issue May 9, 2019 · 6 comments

Comments

@rija
Copy link
Owner

rija commented May 9, 2019

Generally speaking, this issue is to keep track of the problems involved in moving dataset uploaded to File Upload Wizard into the public FTP server hosted on CNGB (using Alibaba Cloud).

The naive approach would be to ftp upload from the cloud instance hosting ftp upload wizard into the public ftp server. However doing so, would require going through the public Internet from International source to mainland destination, which at first glance is known to be slow and not reliable (need to run tests).

The next obvious approach would be to host our webapp on Alibaba Cloud. The problem is that we cannot host it on the same region as the ftp server (international vs mainland) and there's no inter-region communication by default.

As part of investigating options, I have a few questions:

  1. Could you confirm which Alibababa Cloud region is the public ftp server hosted on? (I'd guess it's cn-shenzen)
  2. Has Gigascience/BGI got an Alibaba Cloud's Express Connect subscription ? (that's Alibaba Cloud offering for inter-regions communication and it seems to work across ALL regions, international and mainland)
  3. Is regularly sending a physical hard-drive to the ops team in Shenzen an acceptable option ? (assuming the File Upload Wizard is hosted on an AWS region - like Singapore - that support AWS Snowball - physical shipping of hard-drives -, alternatively - or additionally - we could send a courrier through Lo Wu with the physical hard-drive)?
@rija
Copy link
Owner Author

rija commented May 9, 2019

Alibaba Cloud Express Connect:

@pli888
Copy link
Collaborator

pli888 commented May 10, 2019

Hi @rija. Here are Jesse's answers to your questions.

Could you confirm which Alibababa Cloud region is the public ftp server hosted on? (I'd guess it's cn-shenzen)

The public ftp server is hosted on the China South Shenzhen region.

Has Gigascience/BGI got an Alibaba Cloud's Express Connect subscription ? (that's Alibaba Cloud offering for inter-regions communication and it seems to work across ALL regions, international and mainland)

No. CNGB did not purchase the Express Connect subscription.

Is regularly sending a physical hard-drive to the ops team in Shenzen an acceptable option ? (assuming the File Upload Wizard is hosted on an AWS region - like Singapore - that support AWS Snowball - physical shipping of hard-drives -, alternatively - or additionally - we could send a courrier through Lo Wu with the physical hard-drive)?

Yes, we can regularly send the hard-drive to CNGB and upload data to AliCloud.

@only1chunts
Copy link

Q3 - while its an option, its not something we would want to rely on for all transfers. Perhaps acceptable for 100+GB range datasets, of which there shouldn't be many, but for the majority of datasets we would need the system to take care of the transfer without the need for manual disc shipping.
The 1.25MB/sec quoted above, is that what they are calling the express service? seems rather slow to be called express! If I was paying 170/month I'd expect 10x that speed at least!

@rija
Copy link
Owner Author

rija commented May 10, 2019

Thanks @pli888, @only1chunts, Jesse for the replies.

@only1chunts: that base price is starting price, you can get faster bandwith at higher prices (I wasn't unable to find detailed pricing/bandwith thresholds). I agree, the base price don't feel like a good value.

As Express Connect's value is uncertain and requires significantly more infrastructure to build and maintain (as it means securely setup two interconnecting intranets), and offline transfer may only apply to the rare very large datasets, the only option is transfer through the public Internet.

There could be benefits in deploying File Upload Wizard on Aliyun (rather than AWS), as the public network peering within Aliyun regions should be better, Aliyun also offers an offline data transport service and maybe in the future the Express Connect's value and required effort will look better.

So my plan is to set up the prototype on Aliyun's cn-hongkong region, then I'll set up an internal test server on a different Aliyun region (ideally in cn-shenzen if I can do so wihout needing ICP filing, but it's not critical) so I can devise mechanism for systematic transfer of vetted datasets.

Going that route, the next question for me is:

Is rsync [1] installed on CNGB ? If not, is it acceptable to install it?

  • rsync is preferable for systematic mirroring large amount of data.
  • rsync can be used either as a daemon (default to TCP port 873) or through SSH tunneling.

[1] https://rsync.samba.org/documentation.html

@only1chunts
Copy link

i'm certain rsync is already installed on the public FTP at CNGB, i used it recently on there.

@jessesiu
Copy link

@rija @only1chunts Yes, the rsync installed on CNGB public FTP server, we used it before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants