Wednesday, December 19, 2018

Installing AzCopy v10 on Linux

AzCopy v10 is a command line tool (CLI) for copying files between clients and Azure Blob.

In previous versions the application had to be installed on the client machine. With v10 this is no longer required. AzCopy is now a single file that is downloaded and can be run standalone.

Go here to download the correct version for your OS.

To install:

Go to the above link and copy the download link for Linux OS (will be used with wget command below):

Log in to your server as a regular user

Download the tar.gz file to your home folder:

$ wget https://aka.ms/downloadazcopy-v10-linux

Untar the archived files:

$ tar -xvf downloadazcopy-v10-linux

Go into to the untar'ed folder to locate azcopy executable:

$ cd azcopy_linux_amd64_10.0.4/

Verify that you can run the program:

$ ./azcopy

This will show you which version you're running.

If you have already installed a previous version of azcopy, this will be called unless you specify the full path to the new version.

You can remove the old version simply by deleting the executable (I'm fairly sure that this is sufficient):

$ sudo rm /usr/bin/azcopy

Next copy the new version in to the same location (/usr/bin is likely already in your PATH it can be run from everywhere):

$ sudo cp azcopy /usr/bin/

Go to the root folder, run azcopy and verify that it's v10:

$ cd /

$ azcopy



Alternatively, you can leave the old version and rename the new one to azcopy10. Or you can create an alias for the file and put in your ~/.bashrc_aliases file, see link here for info.

See this link for example copy commands.








Installing Blobfuse in Ubuntu to mount Azure Blob as file share

Azure Blob is around a factor x3 cheaper than Azure Files, it doesn't have the current 5 TB limitation that Azure Files has (100 TB is in tech preview), and it has tiering (hot/cold/archive) to reduce storage prices even further. The problem is, though, that Blob is object storage and not block storage so if you need to mount Blob as a file system from e.g. a Linux server you have to put something in front of it.

Avere and Blobfuse lets you do this. With Blofuse it's important to note that it's not 100% POSIX compliant, see link here for limitations. This means that linux permissions won't apply (chmod, chown) and symbolic links will not work either. We will be given full read/right permissions to the container (as the storage account key is used for authentication).

As a side note, you can do copying of data from CLI with Azcopy v10 and Rclone but they work with their own set of commands and do not remote mount Blob as a folder or share.

Prerequisites:

Note that to set this up you need an SSD disk or a dedicated portion of memory (as a cache) in the VM where you set this up. This example will show how to use memory as a ram cache.

Also, you need to have a storage account created in MS Azure (that we will mount).

This MS guide has been used to set up Blobfuse. Ubuntu 18.04 has been used in this setup but the guide shows how to use other OS'es as well.

Install/configure:

Ssh or log in to your server

Su to root:

$ sudo su -

Configure the Microsoft package repository:

# wget https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb

(or if you're on Ubuntu 16.04: https://packages.microsoft.com/config/ubuntu/16.04/packages-microsoft-prod.deb)

# dpkg -i packages-microsoft-prod.deb
# apt-get update

To install Blobfuse:

# apt-get install blobfuse

Mount an 8 GB temp ram drive for cache (the VM used for this has 16 GB so I'm using half for the cache):

# mkdir /mnt/ramdisk
# mount -t tmpfs -o size=8g tmpfs /mnt/ramdisk
# mkdir /mnt/ramdisk/blobfusetmp

# chown jakob /mnt/ramdisk/blobfusetmp

Go to your home folder:

# cd /home/jakob

Create below file (it doesn't have to be in your home folder but it can be):

# touch fuse_connection.cfg

Edit the file with nano:

The below

# nano fuse_connection.cfg

Add the following content to the file (this info can be found in Azure Portal -> Storage Accounts -> click your storage account -> Access keys. The container name is the name of the virtual folder that you're mounting. This container should already have been created either on in the Azure Portal or the Azure Storage Explorer):

accountName jakobsstorageaccount
accountKey wbMTSWXXXXXXXXXXXXXXXXXXusRtAA==
containerName jakobscontainer1

Save the file and exit.

Change permissions so only root can access it:
# chmod 700 fuse_connection.cfg

Make a folder where the content of container will be mounted:  
# mkdir /mnt/blobfuse/jakobscontainer1

Mount the container:

# blobfuse mnt/blobfuse/jakobscontainer1 --tmp-path=/mnt/ramdisk/blobfusetmp --config-file=/home/jakob/fuse_connection.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120

That's it. Now test that you can list the content of the Blob container:

# ls -alh /mnt/blobfuse/jakobscontainer1

From here you can use cp or Rsync to copy files back and forth.


Thursday, December 13, 2018

Deploy Avere vFXT in Azure

At current client there is an aim to use Azure Blob for storing large amounts of data and also for processing this data in an HPC environment.

A requirement from the users is that they can mount the storage as NFS shares in a POSIX compliant manner. The problem with this is that Blob is object storage and not block storage meaning that this feature isn't available out-of-the-box.

We have tried different things such as Azcopy v10, Blobfuse (which mounts a Blob container as an NFS share, Rclone, Data Lake Storage Gen2 (currently in tech preview) and also we've tried looking at Azure Files as an alternative. Non of it really fulfills the above requirement.

Avere claims at least to be able to solve the problem.

Avere vFXT is sort of a cache layer that you can put in front of either Azure Blob but also on-prem storage solutions and is meant for high volume HPC environments (such as Grid Engine or Slurm).

This guide will describe how to deploy Avere vFXT in Azure and connect to Azure Blob storage.

Architecture

Avere consists of one controller VM and three (minimum) cache node VMs that run in a cluster.


  • Controller: Small VM, small disk
  • Cache VMs: Minimum of 3 x 16 vCPUs, 64 GB mem, 1 TB premium ssd disk (4 x 256 GB RAID0)


The controller VM is deployed from the Azure market place and the three cache VMs are deployed via a script that is run from the controller VM.

This MS guide has been used in the process.

MS recommends creating a separate subscription for the deployment, this is not required but is a nice-to-have to be able to isolate costs. You do however, need to have ownership rights of the subscription for part of the installation.

Installation and configuration

To deploy the controller, log on to portal.azure.com and search for:

Avere vFXT for Azure Controller

Click Create and go through the deployment steps, this is pretty standard.

This will deploy the controller VM.

Next, create a new storage account from the Azure portal that we will later connect to Avere as backend storage (also called "core filer").

Log in to the Controller with ssh (the user will be the admin user specified during deployment) and run the following steps:

$ az login

This will generate a code and ask you to go to https://micorosoft.com/devicelogin  and input the code. When done return the the ssh session.

Then set the subscription ID. You can find that by searching for "subscriptions" in the Azure Portal.

$ az account set --subscription YOUR_SUBSCRIPTION_ID

Edit the avere-cluster.json file, search for "subscription id", replace with your own subscription ID and uncomment the line. Save the file

$ vi /avere-cluster.json
(Click here for quick vi editor guide or use nano)

Create a role for Avere to be able to perform necessary tasks:

$ az role definition create --role-definition /avere-cluster.json
(this is where ownership of the subscription is required, if you don't have it will throw an error)

Next we need to edit the cluster deploy script, make a copy of the script first:

$ cd /
$ sudo cp create-cloudbacked-cluster create-cloudbacked-cluster-blob

The original file has 777 permissions, so give the same to the copied file:

$ sudo chmod 777 create-cloudbacked-cluster-blob

Edit the file:

$ vi create-cloudbacked-cluster-blob

Below I have pasted the part of the script that needs editing and added example info in bold:

---------------

#!/usr/bin/env bash
set -exu

# Resource groups
# At a minimum specify the resource group.  If the network resources live in a
# different group, specify the network resource group.  Likewise for the storage
# account resource group.
# Below resource group I created while creating the controller VM
RESOURCE_GROUP=CLIENT-Avere-test

# The network resource group is an existing group where we want to assign IP addresses from
NETWORK_RESOURCE_GROUP=CLIENT_Network_RG
# I did not specify storage resource group as this is the same the default resource group above. I added the new storage account to that RG.
#STORAGE_RESOURCE_GROUP=

# eastus, etc.  To list:
# az account list-locations --query '[].name' --output tsv
LOCATION=westeurope

# Your VNET and Subnet names.
# To find network name and subnet name go to Azure Portal -> Virtual Networks -> YOUR_NETWORK -> Subnets
NETWORK=CLIENT_network_name
SUBNET=default

# The preconfigured Azure AD role for use by the vFXT cluster nodes.  Refer to
# the vFXT documentation.
AVERE_CLUSTER_ROLE=avere-cluster

# For cloud (blob) backed storage, provide the storage account name for the data
# to live within.
STORAGE_ACCOUNT=new_storage_account_name

# The cluster name should be unique within the resource group.
CLUSTER_NAME=avere-cluster
# Administrative password for the cluster
ADMIN_PASSWORD=INSERT_PASSWORD_HERE

# Cluster sizing for VM and cache disks.
# D16 is the smaller of the two options
INSTANCE_TYPE=Standard_D16s_v3 # or Standard_E32s_v3]
CACHE_SIZE=1024 # or 4096, 8192

# DEBUG="--debug"

# Do not edit below this line

--------------

Save the file and exit.

Run the script:

$  ./create-cloudbacked-cluster-blob

This take around half hour to run and spins up the nodes and the management web portal.

The output on screen will show you the IP address of the management server.

The script output (which is also stored in ~/vfxt.log) mentions a warning that you need to create a new encryption key. To do this:

Log in the web portal using the IP address:

http://IP_of_management_server

User: admin
Passwd: What was specified in the deploy script above

Go to the Settings tab -> Cloud Encryption Settings

Add a new password, click Generate Key and Download File.

This will download a file. Click Choose File and upload the same file (it's a precaution) and the click Activate Key. It will take effect right away.

Make sure you save the key (or certificate) and password as it's needed to access data in a restore or recovery situation.




Next enable Support uploads. This is just a couple of steps, follow this link to do this.

Now the cluster is ready for use and you can mount the Blob storage as an NFS share:

Log in to your client Linux (in this example) machine. It should be on the same network or at least be able to reach the Avere cluster.

There are several ways to distribute the client load between the currently three deployed cache nodes. This is described here. However, for testing purposes you can also mount directly on the IP address of one of the nodes. The IP address can be found on the web portal under Settings -> Cluster Networks.

From client make new directory and mount the remote share:

$ sudo mkdir /mnt/vfxt
$ sudo mount 172.xx.xx.xx:/msazure /mnt/vfxt

And that's it.

And files you copy there will be cached for 10 mins and then asynchronously uploaded to Blob.

Note that browsing the files in the Storage Explorer will show them in an encrypted and unreadable format. So files can only be accessed/read via Avere.

Another note: To be able to have true POSIX compliance including proper ownership of files, a directory service is required e.g. Active Directory (the boxes have to be domain joined via LDAP).








Monday, December 3, 2018

Check external IP address from Linux CLI

Three commands to check the external IP address from command line interface / CLI in Linux:

# wget -qO- http://ipecho.net/plain ; echo

or

# curl ipinfo.io/ip

or

# curl icanhazip.com -4



Tuesday, October 2, 2018

Using Iperf3 for bandwidth and througput test on Linux

At my current client we had to test the network speed between Azure and a local site.
Initially we used Rsync to copy files back and forth and although it gives an ok indication, it does not show the full line speed as Rsync encrypts data during transfer (among other things).

Iperf3 is a really easy to use and simple tool to test the bandwidth or line speed between to machines. This can be either Windows or Linux.

Below shows how to install and run Iperf3:

The test was done on RHEL 7.5 VMs:

1) Install Iperf3 on both the "client" and the "server":

# sudo yum install iperf3

2) Ensure that TCP traffic is allowed inbound on the "server":

# sudo firewall-cmd --zone=public --add-port=5201/tcp --permanent

# sudo firewall-cmd --reload

If you want to run test with UDP, then the following commands should be run:

# sudo firewall-cmd --zone=public --add-port=5201/udp --permanent

# sudo firewall-cmd --reload

3) Start Iperf3 on the "server" and put it in listen mode:

# iperf3 -s

4) Start Iperf3 on the "client" with -c and specify the IP of the server:

# iperf3 -c 192.168.1.25
(replace above IP with IP of your server)

That's it. This will run the test within around a minute and show the result, see screen dumps below.

When we ran the test, we could not max out the 1 Gbit line with TCP. So we changed to UDP and increased the packet size with the following command:

# iperf3 -c 192.168.1.26 --bandwidth 10G  --length 8900 --udp -R -t 180

-c specifies to run command as client
--bandwidth emulates or assumes a 10 Gbit line (even if we just have 1 Gbit)
--length is the packet size
--udp
-R specifies to run the test in reverse. So instead of sending data, you are retrieving data. This is useful in that you can test both ways without changing the setup
-t is amount of seconds. We specified 180 seconds to let it run a bit longer.

Run Iperf3 --help for more options

Below shows a standard test between two VMs in Azure. Results are shown both on the client and on the server.




Tuesday, September 25, 2018

Fixing a corrupt /etc/sudoers file in Linux VM in Azure

I was editing the /etc/sudoers file with nano on a linux VM (RHEL 7.5) in Azure trying to remove or disable being prompted for a password every time I sudo.

I added the following to the file

root        ALL=(ALL:ALL) ALL
myadminuser     ALL=(ALL:ALL) ALL     NOPASSWD: ALL

Apparently that does not follow the correct syntax so immediately after I was not able to sudo. Below is the error meesage:

[myadminuser@MYSERVER ~]$ sudo reboot
>>> /etc/sudoers: syntax error near line 93 <<<
sudo: parse error in /etc/sudoers near line 93
sudo: no valid sudoers sources found, quitting
sudo: unable to initialize policy plugin


Since on the Azure VMs you don't have the root password, then you're stuck as the regular user do not have permissions to edit the sudoers file and you can't sudo to root.

You could mount the VM disk to another VM and then edit the file that way, but that is cumbersome.

Fix:

From the Azure portal start Cloud CLI, choose Powershell

Run the following command to make /etc/sudoers editable by master

az vm run-command invoke --resource-group YOUR_RESOURCE_GROUP --name YOURVM --command-id RunShellScript --scripts "chmod 446 /etc/sudoers"

This gives the regular user permission to edit the file

with nano or VI undo the changes (i just deleted the NOPASSWD: ALL): 

nano /etc/sudoers (no sudo since you have access)

after edit, run the below command to configure default access to file.

az vm run-command invoke --resource-group YOUR_RESOURCE_GROUP --name YOURVM --command-id RunShellScript --scripts "chmod 440 /etc/sudoers"

I got the fix from the following link. Note that the syntax has changed a bit.

The useful thing about this command is that you can execute any command as root on your VMs as long as you have access to the Azure portal.

How to edit /etc/sudoers:

To ensure that you don't introduce the wrong syntax in the file, use the command to edit:

visudo

This will open the file using vi editor and if you use wrong syntax you'll get a warning/error.

See this link for a quick guide using vi editor

Update: 2018.11.07: On RHEL 7.5 and with visudo, the below lines work, meaning that with the command:
# sudo su -
you're not prompted for passwd

root    ALL=(ALL)       ALL
myadminuser    ALL=(ALL)       NOPASSWD: ALL