Managing Shell Access In A Cloud Environment – Part I

by Ali Tayarani in


One of the hardest things to design in Operations is a scalable authentication system, especially with a dynamic system inventory like we have at Tapjoy. In designing a viable authentication system, we realize we need to implement flexibility and functionality in a way that both reduces the amount of time the Operations team spends on user management and increases the ability of the engineers to be self-sufficient.  On top of that, we want to increase security.

In the initial incarnation of the system, we want to handle authentication for our servers only; however, we may decide to expand it to other LDAP-enabled engineering services like Jenkins.

In order to accomplish these goals, we have rolled out OpenLDAP.  Why OpenLDAP?

Other Options

Another common solution for this problem is to implement NIS; however, the problem with NIS is that it’s only easy to set up initially, while being difficult to maintain over time at scale.  Additionally, it doesn’t have as many projects to introduce flexibility as LDAP does.

Alternatively, we could have had Chef maintain a user list with keys; however, this solution scales poorly and user updates would depend on the splay between Chef runs.  Since Chef is maintained with git, authentication information would remain decentralized.

Centralized Authority

LDAP is a server-based model that allows the LDAP server to be the authority for all authentication requests.  This means that anyone with rights to query the LDAP server would be able to see the current state of who is configured to have access to the system and what groups they are in.

Group Management

We also needed a way to manage groups, which is something that we haven’t done before.  Fortunately, for us, LDAP not only manages primary group membership, but also tracks secondary groups.

Scalability

LDAP supports several scaling methodologies that can be selected from based on ease of use and desired scale.  We were able to find a system that will allow us to scale redundantly and add no overhead to the management of the additional servers.

Reduced Operational Supervision

In our previous model, a member of the Operations team had to get involved each time an engineer needed access to a production host.  The overhead presented meant that Operations would often serve as a proxy for engineering, carrying out commands on a server on behalf of an engineer.  It also meant that engineers could not immediately assist in a critical event without Operations intervening first.

By leveraging sudoers, LDAP, and PAM, engineers can permanently have access to any resource they need and only the resources they need.  Since LDAP supports auditing and can be used to track who has access to what, per-use access grants are no longer needed.

Today we discussed what we wanted and how we came to that decision.  In the next post, we’ll cover some techniques we used to implement the ideas laid out here.

 

Think this is interesting? We're hiring! See our current openings here.