On my previous post I covered how to allow client-to-site connectivity to an AWS VPC environment.
On this post I decided to continue exploring AWS VPC connectivity and talk about how to connect VPCs. If you have VPCs on the same region, you could simply use VPC peering and be done with it. But if your VPCs are located in different regions, you'll need to explore your options.
I decided to test and document one of the more inexpensive and simple options I could think of, full mesh connectivity between VPCs using IPsec site-to-site tunnels. And the inexpensive part is taken care of by using StrongSwan 5.4.0 on CentOS 7 to implement this.
Basically the scenario here is that I want to connect two VPCs on different regions:
us-east-1 VPC with IP addresses in 172.16.0.0/16;
us-east-2 VPC with IP addresses in 172.31.0.0/16.
It is a simple exercise to extrapolate this configuration to have additional VPCs connected to these two via full mesh, so I won't get into the specifics of this here. Consider that as homework.
IP Addresses and Security Groups
First, create one Elastic IP Address for each StrongSwan instance. Optionally, create a hostname for each in Route53 if you think that will help you later on.
Then, create one security group for each of the StrongSwan instances. Leave all outbound traffic as allowed, and create the following inbound rules:
Create an SSH rule to allow you to log into the box later on;
Allow
All traffic
from all of the VPC IP address ranges. In our example, this means allowing all traffic from 172.16.0.0/16 and 172.31.0.0/16 on the security group. This is necessary because when an instance acts a router, you can't differentiate traffic directed to its own IP address or to one of the remote networks it can route to on the security group. Any such differentiation will unfortunately need to be implemented internally in iptables;For each of the other elastic IP addresses of StrongSwan instances it will need to connect to, create the following rules:
Type | Protocol | Port Range | Source |
---|---|---|---|
Custom ICMP Rule - IPv4 | Time Exceeded | All | elastic IP/32 |
Custom ICMP Rule - IPv4 | Destination Unreachable | All | elastic IP/32 |
Custom ICMP Rule - IPv4 | Echo Reply | N/A | elastic IP/32 |
Custom ICMP Rule - IPv4 | Echo Request | N/A | elastic IP/32 |
Custom ICMP Rule - IPv4 | Traceroute | N/A | elastic IP/32 |
Custom Protocol | AH (51) | All | elastic IP/32 |
Custom UDP Rule | UDP | 4500 | elastic IP/32 |
Custom UDP Rule | UDP | 500 | elastic IP/32 |
The ICMP rules above serve two purposes. Firstly, the traceroute and echo reply/request ones will make it easier for you to troubleshoot the connectivity between the StrongSwan instances. Most importantly, though, the time exceeded and destination unreachable entries are there to allow path MTU discovery to happen properly between StrongSwan instances communicating over the Internet.
Next, update all of existing security groups in each VPC to ensure these same ICMP messages are accepted from all VPCs IP address ranges (172.16.0.0/16 and 172.31.0.0/16 in our example). The objective here is similar: to allow troubleshooting and proper path MTU discovery to happen on the end-to-end communications between machines on different VPCs through the VPN.
Create StrongSwan Instances and Configure Linux
This is what you need to keep in mind when creating the instances:
Pick the instance type you'll need. Something on the c4 family for more heavy-duty traffic volumes, or something on the t2 family should be more than enough for sporadic management / admin traffic;
Use the latest CentOS 7 AMI to create a new instance on a public subnet of the chosen region with the security group we recently created;
Associate the elastic IP address to the instance;
Disable the source/destination check on the instance since it will act as a router.
Then, SSH into the machine (keep in mind the default username for the AMI is centos
) so we can configure the operating system properly. Make sure you become root for the following configuration steps.
Ensure that /etc/sysctl.conf
contains the following lines and then force them to be loaded by running sysctl -p /etc/sysctl.conf
or by rebooting:
net.ipv4.ip_forward = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.tcp_max_syn_backlog = 1280
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.tcp_mtu_probing = 1
As a side note, it is strongly recommended that you include net.ipv4.tcp_mtu_probing = 1
on the sysctl.conf
of all of your Linux EC2 instances, since they use jumbo frames by default.
Let's make sure the machine is fully patched, that we can use EPEL and that we install StrongSwan by issuing the following commands:
yum install epel-release
yum repolist
yum update
yum install strongswan
systemctl enable strongswan
In order to ensure the cryptography and logging work properly, the system needs to have proper time synchronization. Make sure NTP is installed and configured to run on system start:
yum install ntp
systemctl enable ntpd
Replace the server
configuration entries in /etc/ntp.conf
so the AWS recommended NTP server pool is used:
server 0.amazon.pool.ntp.org iburst
server 1.amazon.pool.ntp.org iburst
server 2.amazon.pool.ntp.org iburst
server 3.amazon.pool.ntp.org iburst
Finally, restart the NTP service with systemctl restart ntpd
and check that it is working properly with ntpq -p
.
Configuring StrongSwan
We'll configure StrongSwan to use RSA keys for authentication, so the first step is to create those keys and associate them with the servers in the StrongSwan configuration.
On each StrongSwan instance, create its own RSA key. This is how you would do it on the us-east-1 StrongSwan instance:
cd /etc/strongswan/ipsec.d/private/
openssl genrsa -out us-east-1.key 4096
chmod og-r us-east-1.key
openssl rsa -in us-east-1.key -pubout > ../certs/us-east-1.pub
Once you do that, you need to edit /etc/strongswan/ipsec.secrets
to let StrongSwan know what to do with the private key. Add a line to that file that associates each instance's own elastic IP address to the key file. Assuming the elastic IP address of the us-east-1 StrongSwan instance is 1.2.3.4
, this is what that line would look like:
1.2.3.4 : RSA us-east-1.key
Then, you copy each StrongSwan instance's .pub
file to the /etc/strongswan/ipsec.d/certs
directory of each of the other StrongSwan instances. In our example, if you were on the us-east-1 instance you would see something like this:
$ find /etc/strongswan/ipsec.d/ -name *.key
/etc/strongswan/ipsec.d/private/us-east-1.key
$ find /etc/strongswan/ipsec.d/ -name *.pub
/etc/strongswan/ipsec.d/certs/us-east-1.pub
/etc/strongswan/ipsec.d/certs/us-east-2.pub
Finally, you configure /etc/strongswan/ipsec.conf
to tie it all together. This is what the configuration file would look like the the elastic IP for the us-east-1 and us-east-2 instances were 1.2.3.4 and 2.3.4.5, respectively:
config setup
# strictcrlpolicy=yes
# uniqueids = no
conn %default
fragmentation=force
dpdaction=restart
ike=aes192gcm16-aes128gcm16-aes192-prfsha256-ecp256-ecp521,aes192-sha256-modp3072
esp=aes192gcm16-aes128gcm16-aes192-ecp256,aes192-sha256-modp3072#
keyingtries=%forever
keyexchange=ikev2
authby=rsasig
forceencaps=yes
leftid=1.2.3.4
leftrsasigkey=us-east-1.pub
leftsubnet=172.16.0.0/16
# Add connections here.
conn us-east-2
right=2.3.4.5
rightsubnet=172.31.0.0/16
rightrsasigkey=us-east-2.pub
auto=start
Keep in mind that left
in StrongSwan parlance means the side of the VPN that is local to the instance you are configuring, and right
is the remote side. So the configuration file on the us-east-2 instance would look like this:
config setup
# strictcrlpolicy=yes
# uniqueids = no
conn %default
fragmentation=yes
dpdaction=restart
ike=aes192gcm16-aes128gcm16-aes192-prfsha256-ecp256-ecp521,aes192-sha256-modp3072
esp=aes192gcm16-aes128gcm16-aes192-ecp256,aes192-sha256-modp3072#
keyingtries=%forever
keyexchange=ikev2
authby=rsasig
forceencaps=yes
leftid=2.3.4.5
leftrsasigkey=us-east-2.pub
leftsubnet=172.31.0.0/16
# Add connections here.
conn us-east-1
right=1.2.3.4
rightsubnet=172.16.0.0/16
rightrsasigkey=us-east-1.pub
auto=start
Please review the StrongSwan documentation on ipsec.conf to better understand some of the choices I've made there, and tweak the setup to meet your needs. I wouldn't change the configuration on the fragmentation
and forceencaps
options, though, since I had problems if they were not set as above.
Once you've set all of this up, run systemctl restart strongswan
and monitor the logs with tail -f /var/log/messages | grep charon
for log entries related to the IPsec tunnel negotiations and authentication.
Hopefully by now you will be able to ping us-east-2's StrongSwan instance internal (172.31.0.x) IP address from the SSH session on us-east-1's StrongSwan instance.
Routing
Finally, in order to allow machines on one region to talk to machines and services on the other, we'll need to update the route tables.
What you need to do is to add a new route that tells machines on a region that in order to talk to the addresses on the other regions, they must go through the StrongSwan instance.
So in our example, you should add a new route to all routing tables in us-east-1 that has a Destination
of 172.31.0.0/16, and a Target
that is the instance ID of the us-east-1 StrongSwan instance.
Conversely, you should add a new route to all routing tables in us-east-2 that has a Destination
of 172.16.0.0/16, and a Target
that is the instance ID of the us-east-2 StrongSwan instance.
Finally, make sure that the security groups of services that need to be accessed across the VPN will now allow the IP addresses of the remote machines in. Once you do that, you can then test the communication between regions successfully. Of course, if you enabled ICMP as recommended above, you should be able to ping any instance in us-east-2 from any instance in us-east-1 and vice-versa by now.
Availability Concerns
You could achieve some level of redundancy and distribution of load by increasing the number of VPN concentrator instances you stand up.
One idea would be to create one VPN concentrator per availability zone instead of just one per region. In this scenario even if one availability zone (or its StrongSwan instance) become unavailable, the rest of the availability zones will remain connected.
This is a high level guide of what that would entail in addition to what was discussed above:
- Create the additional StrongSwan instances as per the instructions above;
- Separate the routing tables per availability zone and assign each one to its corresponding subnets;
- Update
ipsec.conf
on all machines to have one connection for each VPN concentrator. Also update each one'sleftsubnet
andrightsubnet
definitions so that each server is only responsible for the IP address ranges of the subnets in its availability zone.
I have not covered implementing HA on StrongSwan, though apparently that is supported as well. If you get this working let me know.
Additional Recommendations
A few security-minded tips that I would recommend you implement:
Ensure you close off SSH access to the StrongSwan instances after you're done configuring them, by removing the applicable Security Group inbound rule. You can always allow it temporarily again on the Security Group if and when you need it.
Install the CloudWatch Logs Agent on the machine, remember we covered this already here. Make sure you collect at least the following files:
/var/log/messages
,/var/log/secure
and/var/log/audit/audit.log
.Harden the operating system and make sure to keep install security updates as they become available.