Connecting AWS VPCs with StrongSwan

// Thu 30 March 2017

On my previous post I covered how to allow client-to-site connectivity to an AWS VPC environment.

On this post I decided to continue exploring AWS VPC connectivity and talk about how to connect VPCs. If you have VPCs on the same region, you could simply use VPC peering and be done with it. But if your VPCs are located in different regions, you'll need to explore your options.

I decided to test and document one of the more inexpensive and simple options I could think of, full mesh connectivity between VPCs using IPsec site-to-site tunnels. And the inexpensive part is taken care of by using StrongSwan 5.4.0 on CentOS 7 to implement this.

Basically the scenario here is that I want to connect two VPCs on different regions:

us-east-1 VPC with IP addresses in 172.16.0.0/16;
us-east-2 VPC with IP addresses in 172.31.0.0/16.

It is a simple exercise to extrapolate this configuration to have additional VPCs connected to these two via full mesh, so I won't get into the specifics of this here. Consider that as homework.

IP Addresses and Security Groups

First, create one Elastic IP Address for each StrongSwan instance. Optionally, create a hostname for each in Route53 if you think that will help you later on.

Then, create one security group for each of the StrongSwan instances. Leave all outbound traffic as allowed, and create the following inbound rules:

Create an SSH rule to allow you to log into the box later on;
Allow All traffic from all of the VPC IP address ranges. In our example, this means allowing all traffic from 172.16.0.0/16 and 172.31.0.0/16 on the security group. This is necessary because when an instance acts a router, you can't differentiate traffic directed to its own IP address or to one of the remote networks it can route to on the security group. Any such differentiation will unfortunately need to be implemented internally in iptables;
For each of the other elastic IP addresses of StrongSwan instances it will need to connect to, create the following rules:

Type	Protocol	Port Range	Source
Custom ICMP Rule - IPv4	Time Exceeded	All	elastic IP/32
Custom ICMP Rule - IPv4	Destination Unreachable	All	elastic IP/32
Custom ICMP Rule - IPv4	Echo Reply	N/A	elastic IP/32
Custom ICMP Rule - IPv4	Echo Request	N/A	elastic IP/32
Custom ICMP Rule - IPv4	Traceroute	N/A	elastic IP/32
Custom Protocol	AH (51)	All	elastic IP/32
Custom UDP Rule	UDP	4500	elastic IP/32
Custom UDP Rule	UDP	500	elastic IP/32

The ICMP rules above serve two purposes. Firstly, the traceroute and echo reply/request ones will make it easier for you to troubleshoot the connectivity between the StrongSwan instances. Most importantly, though, the time exceeded and destination unreachable entries are there to allow path MTU discovery to happen properly between StrongSwan instances communicating over the Internet.

Next, update all of existing security groups in each VPC to ensure these same ICMP messages are accepted from all VPCs IP address ranges (172.16.0.0/16 and 172.31.0.0/16 in our example). The objective here is similar: to allow troubleshooting and proper path MTU discovery to happen on the end-to-end communications between machines on different VPCs through the VPN.

Create StrongSwan Instances and Configure Linux

This is what you need to keep in mind when creating the instances:

Pick the instance type you'll need. Something on the c4 family for more heavy-duty traffic volumes, or something on the t2 family should be more than enough for sporadic management / admin traffic;
Use the latest CentOS 7 AMI to create a new instance on a public subnet of the chosen region with the security group we recently created;
Associate the elastic IP address to the instance;
Disable the source/destination check on the instance since it will act as a router.

Then, SSH into the machine (keep in mind the default username for the AMI is centos) so we can configure the operating system properly. Make sure you become root for the following configuration steps.

Ensure that /etc/sysctl.conf contains the following lines and then force them to be loaded by running sysctl -p /etc/sysctl.conf or by rebooting:

net.ipv4.ip_forward = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.tcp_max_syn_backlog = 1280
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.tcp_mtu_probing = 1

As a side note, it is strongly recommended that you include net.ipv4.tcp_mtu_probing = 1 on the sysctl.conf of all of your Linux EC2 instances, since they use jumbo frames by default.

Let's make sure the machine is fully patched, that we can use EPEL and that we install StrongSwan by issuing the following commands:

yum install epel-release
yum repolist
yum update
yum install strongswan
systemctl enable strongswan

In order to ensure the cryptography and logging work properly, the system needs to have proper time synchronization. Make sure NTP is installed and configured to run on system start:

yum install ntp
systemctl enable ntpd

Replace the server configuration entries in /etc/ntp.conf so the AWS recommended NTP server pool is used:

server 0.amazon.pool.ntp.org iburst
server 1.amazon.pool.ntp.org iburst
server 2.amazon.pool.ntp.org iburst
server 3.amazon.pool.ntp.org iburst

Finally, restart the NTP service with systemctl restart ntpd and check that it is working properly with ntpq -p.

Configuring StrongSwan

We'll configure StrongSwan to use RSA keys for authentication, so the first step is to create those keys and associate them with the servers in the StrongSwan configuration.

On each StrongSwan instance, create its own RSA key. This is how you would do it on the us-east-1 StrongSwan instance:

cd /etc/strongswan/ipsec.d/private/
openssl genrsa -out us-east-1.key 4096
chmod og-r us-east-1.key
openssl rsa -in us-east-1.key -pubout > ../certs/us-east-1.pub

Once you do that, you need to edit /etc/strongswan/ipsec.secrets to let StrongSwan know what to do with the private key. Add a line to that file that associates each instance's own elastic IP address to the key file. Assuming the elastic IP address of the us-east-1 StrongSwan instance is 1.2.3.4, this is what that line would look like:

1.2.3.4 : RSA us-east-1.key

Then, you copy each StrongSwan instance's .pub file to the /etc/strongswan/ipsec.d/certs directory of each of the other StrongSwan instances. In our example, if you were on the us-east-1 instance you would see something like this:

$ find /etc/strongswan/ipsec.d/ -name *.key
/etc/strongswan/ipsec.d/private/us-east-1.key
$ find /etc/strongswan/ipsec.d/ -name *.pub
/etc/strongswan/ipsec.d/certs/us-east-1.pub
/etc/strongswan/ipsec.d/certs/us-east-2.pub

Finally, you configure /etc/strongswan/ipsec.conf to tie it all together. This is what the configuration file would look like the the elastic IP for the us-east-1 and us-east-2 instances were 1.2.3.4 and 2.3.4.5, respectively:

config setup
    # strictcrlpolicy=yes
    # uniqueids = no

conn %default
    fragmentation=force
    dpdaction=restart
    ike=aes192gcm16-aes128gcm16-aes192-prfsha256-ecp256-ecp521,aes192-sha256-modp3072
    esp=aes192gcm16-aes128gcm16-aes192-ecp256,aes192-sha256-modp3072#
    keyingtries=%forever
    keyexchange=ikev2
    authby=rsasig
    forceencaps=yes
    leftid=1.2.3.4
    leftrsasigkey=us-east-1.pub
    leftsubnet=172.16.0.0/16

# Add connections here.
conn us-east-2
    right=2.3.4.5
    rightsubnet=172.31.0.0/16
    rightrsasigkey=us-east-2.pub
    auto=start

Keep in mind that left in StrongSwan parlance means the side of the VPN that is local to the instance you are configuring, and right is the remote side. So the configuration file on the us-east-2 instance would look like this:

config setup
    # strictcrlpolicy=yes
    # uniqueids = no

conn %default
    fragmentation=yes
    dpdaction=restart
    ike=aes192gcm16-aes128gcm16-aes192-prfsha256-ecp256-ecp521,aes192-sha256-modp3072
    esp=aes192gcm16-aes128gcm16-aes192-ecp256,aes192-sha256-modp3072#
    keyingtries=%forever
    keyexchange=ikev2
    authby=rsasig
    forceencaps=yes
    leftid=2.3.4.5
    leftrsasigkey=us-east-2.pub
    leftsubnet=172.31.0.0/16

# Add connections here.
conn us-east-1
    right=1.2.3.4
    rightsubnet=172.16.0.0/16
    rightrsasigkey=us-east-1.pub
    auto=start

Please review the StrongSwan documentation on ipsec.conf to better understand some of the choices I've made there, and tweak the setup to meet your needs. I wouldn't change the configuration on the fragmentation and forceencaps options, though, since I had problems if they were not set as above.

Once you've set all of this up, run systemctl restart strongswan and monitor the logs with tail -f /var/log/messages | grep charon for log entries related to the IPsec tunnel negotiations and authentication.

Hopefully by now you will be able to ping us-east-2's StrongSwan instance internal (172.31.0.x) IP address from the SSH session on us-east-1's StrongSwan instance.

Routing

Finally, in order to allow machines on one region to talk to machines and services on the other, we'll need to update the route tables.

What you need to do is to add a new route that tells machines on a region that in order to talk to the addresses on the other regions, they must go through the StrongSwan instance.

So in our example, you should add a new route to all routing tables in us-east-1 that has a Destination of 172.31.0.0/16, and a Target that is the instance ID of the us-east-1 StrongSwan instance.

Conversely, you should add a new route to all routing tables in us-east-2 that has a Destination of 172.16.0.0/16, and a Target that is the instance ID of the us-east-2 StrongSwan instance.

Finally, make sure that the security groups of services that need to be accessed across the VPN will now allow the IP addresses of the remote machines in. Once you do that, you can then test the communication between regions successfully. Of course, if you enabled ICMP as recommended above, you should be able to ping any instance in us-east-2 from any instance in us-east-1 and vice-versa by now.

Availability Concerns

You could achieve some level of redundancy and distribution of load by increasing the number of VPN concentrator instances you stand up.

One idea would be to create one VPN concentrator per availability zone instead of just one per region. In this scenario even if one availability zone (or its StrongSwan instance) become unavailable, the rest of the availability zones will remain connected.

This is a high level guide of what that would entail in addition to what was discussed above:

Create the additional StrongSwan instances as per the instructions above;
Separate the routing tables per availability zone and assign each one to its corresponding subnets;
Update ipsec.conf on all machines to have one connection for each VPN concentrator. Also update each one's leftsubnet and rightsubnet definitions so that each server is only responsible for the IP address ranges of the subnets in its availability zone.

I have not covered implementing HA on StrongSwan, though apparently that is supported as well. If you get this working let me know.

Additional Recommendations

A few security-minded tips that I would recommend you implement:

Ensure you close off SSH access to the StrongSwan instances after you're done configuring them, by removing the applicable Security Group inbound rule. You can always allow it temporarily again on the Security Group if and when you need it.
Install the CloudWatch Logs Agent on the machine, remember we covered this already here. Make sure you collect at least the following files: /var/log/messages, /var/log/secure and /var/log/audit/audit.log.
Harden the operating system and make sure to keep install security updates as they become available.

Go Top