Keep Jupyter Notebook running on GCP - machine-learning

I am using Google Cloud Platform to make a machine learning model for a problem. The dataset is huge and the model will take many hours to train.
I want to keep my notebook running and therefore keep my model training until the training is completed even though I turned off my local machine i.e my PC.
Will the notebook keep running as long as I don't turn off the instance.? Or do I want to do something else to make sure the model continues training.?
I am pretty new to GCP, thanks in advance.

Related

How to collect performance metrics from running google cloud ml training instances?

I'm running a model on google cloud ml training, and it's taking about 10 hours with some naive guesses at the shapes of the machine. I'd like to optimize it a bit to cut down on running time and overall cost.
What's the best way to determine if I'm using the resources effectively? I'd like cpu measurements, memory pressure, and GPU usage (whenever they are available). I suspect I'd need to either 1) log these or 2) install a monitoring agent like stack driver, and assume things like nvidia-smi are locatable, but I'm curious if any one has tried.
This feature is now built-in into the product -- the CPU and RAM usage metrics (for now) are published as a Stackdriver metrics.
A view of the metrics is also displayed in the console on the job detail page.
Hope that helps.

distribute tensorflow demos

Recently, tensorflow had add the distribute training module, what's the distribute pre-requirement? I mean the environment like this,
tensorflow >= 0.8 kubernates shared file system, gcloud?
And it had release the example code:
Is there any way to run tensorflow cluster example, when only have hdfs and without any shared file system, where will model file store in?
Each computer will need to have tensorflow installed, (and in my experience, they should all be the same version. I had a few issues mixing versions 8 and 9).
Once that is set up, each computer will need access to the code it is to run (main.py for example). We use an NFS to share this, but you could just as easily git pull on each machine to get the latest copy of your code.
Then you just need to start them up. We would just ssh to each machine in our most basic setup, but if you have a cluster like kubernates, then it may be different for you.
As for checkpoints, I believe only the chief worker writes to checkpoint files if that's what your last question was asking.
Let me know if you have further questions.

How do I set up TensorFlow in the Google cloud?

How do I set up a TensorFlow in the Google cloud? I understand how to create a Google Compute Engine instance, and how to run TensorFlow locally; and a recent Google blog post suggests that there ought to be a way to create a Google Compute Engine instance and run TensorFlow applications in the cloud:
Machine Learning projects can come in many sizes, and as we’ve seen
with our open source offering TensorFlow, projects often need to scale
up. Some small tasks are best handled with a local solution running on
one’s desktop, while large scale applications require both the scale
and dependability of a hosted solution. Google Cloud Machine Learning
aims to support the full range and provide a seamless transition from
local to cloud environment.
Even if I'm reading a bit much into this, it has to be the case, given what competing platforms such as Microsoft's Azure offer, that there's a way to set up TensorFlow applications (developed locally and "seamlessly" scaled up into the cloud, presumably using GPUs) in the Google cloud.
For example, I'd like to work locally in my IDE tuning the features and code for my project, running limited training and validation there, and push the code periodically to the cloud to run train there with (arbitrarily) greater resources, and then save and download the trained model. Or perhaps even better, just run the graphs (or parts of graphs) in the cloud using tunable resources.
Is there a way to do this; is one planned? How do I set up TensorFlow in the Google cloud?
This is still in limited preview. The best you can do is sign up and hope that they select you to be part of the preview.
https://cloud.google.com/ml/
Edit: CloudML is now in public beta so anyone can use it without signing up and requesting access. We hope you give it a try! We have a tag for questions: google-cloud-ml.
I would suggest you to follow this tutorial that guides you step by step:
https://www.youtube.com/watch?v=N422_CYuzZg
Here is the main article to set up the account etc.
https://cloud.google.com/solutions/machine-learning-with-financial-time-series-data
As described on the Kubernetes blog, you can run TensorFlow on Kubernetes. It links to "a step-by-step tutorial that shows you how to create the TensorFlow Serving Docker container to serve the Inception-v3 image classification model", which you should be able to adapt to running your own TensorFlow workload. You can use Google Container Engine to run Kubernetes on Google's cloud.
Or, as Aaron mentioned, you can try to sign up for early access to Google's CloudML product.

How do i test my application for network perfomance while only having a single PC?

Assume I have built high available and high reliable application. Now I need to test it for network performance. But I have only a single desktop computer at home. How do I test my application for network performance?
Is there an app that will create virtual clients and create a virtual network environment (but all clients are not humans except me) so I could create a network of 100 machines virtually from my single desktop??????
If there are apps like that, how many of them available for free?
My desktop runs on windows 8
thanks
You mean for example like test if you have some concurrence problems trhowing a lot of calls to your service from different threads?
You can try, Jmeter. It´s free, and you can configure for example to run 10 tests, and run all the tests at the same time from 10 different threads.
Is a good ay to simulate x number of calls ;)
I am not sure, but I believe that with SoapUI you can do more or less the same too
You could use Virtual Machines (for example VirtualBox) and run them on your single machine.
IF you want to model a virtual network in a box then look to Shunra. You can usually pick up used Shunra devices on EBAY for a fraction of their initial cost and their value for testing purposes. You can model very accurately complex network infrastructures with their devices and then alter routing on your box to go out one IP address from a Virtual machine, traverse the complex network of delays and route back to your service on the same box.
If you want something less than investing in hardware, then take a look at Charles Proxy. It's a great piece of software that would allow you from a couple of hosts to look at the behavior across an impaired link with particular characteristics. You can also alter some of the caching characteristics of the application in the proxy to see how this alters your perceived performance.
If you just want to model your data flows for the application then take a look at IxChariot (from Ixia Communications) and pair it with Shunra. You could crank up the number of dataflows and see what happens purely to the network response times independent of the performance of the application

Does TensorFlow support to save the initial hyper-parameter configuration automatically?

We need to run the designed networks many times for better performance and it would be better to record our experiments we have run. Maybe it could be good to provide to record these hyper-parameter configuration automatically by the tensorflow execution engine. For example, I record by set different directory name for the log directory as:
log_lr_0.001_theta_0.1_alpha_0.1
log_lr_0.01_theta_0.01_alpha_0.02
....
Are there any automatic ways to help this? In addition, it would be better that when we start a new tensorflow training instance, a new port will be allocated and a new tensorboard is started and shows its learning state.
No, tensorflow doesn't support initial hyper parameter configuration automatically.
I've faced the same issue as you, and I'm using a tool called Sacred, I hope you'd find that useful.

Resources