Google Site Reliability Engineer Interview Questions & Answers

Below is a list of our Google interview questions. Click on any interview question to view our answer advice and answer examples. You may view 5 answer examples before our paywall loads. Afterwards, you'll be asked to upgrade to view the rest of our answers.

Table of Contents

1. Direct Questions
2. Discovery Questions
3. Operational Questions
4. Opertaional Questions
5. Technical Questions

Direct

1. What are some of the basic issues a site reliability engineer addresses in their daily activities?

How to Answer

As an experienced site reliability engineer, you should be intimately familiar with the issues related to this job. When describing these, you may want to include issues Google is currently encountering. You can discover these by researching the organization's website, industry periodicals, news blogs, and other sources of information.

Written by William Swansen on November 11th, 2021

Answer Example

"There are some fundamental issues related to site reliability engineering, or SRE, which I encounter nearly every day. These include how SRE supports the DevOps organization, service level objectives, service level indicators, error budgets, ways to reduce toil, some of the technologies and automation used by an SRE, and the concept of anti-fragility."

Written by William Swansen on November 11th, 2021

Discovery

2. Can you describe the three pillars of observability and describe the one you depend on the most?

How to Answer

Throughout the interview with Google, you will be asked a series of technical questions about the tools, methodologies, and processes you use in this job. You are expected to have a deep understanding of some of these and be familiar with others. Since observability is a fundamental practice used by a site reliability engineer, you should have extensive knowledge about this and be able to talk about what contributes to effective observability. As with any technical question, keep your answer brief and to the point and anticipate a follow-up question from the Google interviewer.

Written by William Swansen on November 11th, 2021

Answer Example

"The three pillars of observability are metrics, tracing, and logging. I use each of these a great deal in my work as a site reliability engineer. Most of my work involves measuring systems and determining how to adjust them for optimal performance. I continually strive to increase the observability within organizations' systems and processes by developing more effective measurement practices. This work involves refining the metrics, automating the tracing and logging, and educating the workforce about the importance of observability."

Written by William Swansen on November 11th, 2021

Operational

3. What are some of the databases you've used in your previous roles? How do you manage database query times?

How to Answer

Since site reliability engineers deal with a lot of data in their measurements and analysis, you need to be able to work effectively with database tools. Each organization has a preference for which databases and tools they use, so the SRE needs to be flexible and able to adapt to new environments. When answering this question, you should provide the interviewer with the names of the databases you've worked with. If they do not correspond to the databases used by Google, you should discuss your ability to apply your existing knowledge to rapidly learn the features of the new database.

Written by William Swansen on November 11th, 2021

Answer Example

"During my career, I've worked with several different databases. These include Oracle Database, MySQL, Microsoft SQL Server, and dbase. While each of these is unique in its features, commands, and structure, they are all fundamentally the same. I can quickly transition between database tools by applying the knowledge I already have and using guides and manuals to learn the new commands and database structures. There are several ways to improve database query times. These include avoiding wild card queries, choosing the appropriate data types, avoiding NULL in fixed-length fields, and other techniques to reduce the scope of queries."

Written by William Swansen on November 11th, 2021

Operational

4. When analyzing a software development pipeline, how do you identify ways to improve its efficiency?

How to Answer

The key skill of a competent site reliability engineer is the ability to observe and analyze systems and operations within an organization like Google. This is the essence of this job. Google hires people who can help them improve processes and procedures to save the organization both time and money. Describing how you go about doing this will be fundamental in convincing the interviewer that you are the right person for this role.

Written by William Swansen on November 11th, 2021

Answer Example

"I am constantly looking for opportunities to improve the processes and procedures used by my organization. When it comes to the software development pipeline, there are several ways you can improve its efficiency. One is to examine the resources required for each development project and allocate them effectively. I also look for individual development projects that are bottlenecks, which results in delays throughout the pipeline. Finally, I will always keep in mind the needs and resources of Google's operations deployment teams to ensure that they are neither over-scheduled nor under-resourced with new applications."

Written by William Swansen on November 11th, 2021

Operational

5. What is your strategy for staying up to date with industry trends and resources?

How to Answer

Google site reliability engineers work in an industry that is constantly changing and evolving. The rapid pace of this change, combined with the numerous sources of information and developments, makes it challenging to stay on top of recent updates. Every competent site reliability engineer will have a strategy to keep their knowledge of this profession up to date and learn about new practices, tools, methodologies. You should be able to easily describe this to the Google interviewer.

Written by William Swansen on November 11th, 2021

Answer Example

"I have found that one of the biggest challenges in this job is staying on top of new developments in the industry. I recognize the importance of this and have developed a system that enables me to accomplish it. I start by allocating both work and leisure time towards learning new SRE practices. Specific activities include reviewing industry publications, reading technical blogs, attending industry conferences, and networking with my peers in my organization and others in the field. I also have developed strong relationships with the manufacture representatives to learn about what they are working on."

Written by William Swansen on November 11th, 2021

Operational

6. How do you integrate the Google customer experience into your SRE strategy?

How to Answer

The most important stakeholder for any project is the organization's customers. The actions of the individual teams and projects should contribute to the organization's objectives. These goals often involve customer revenues and customer satisfaction. Therefore, you need to consider these when developing your strategies. Communicating this to the Google interviewer will demonstrate your qualifications for this position and improve your chances of being selected for the role.

Written by William Swansen on November 11th, 2021

Answer Example

"The Google customer experience is one of my top considerations whenever I'm planning or developing a new project. I always keep in mind how completing the project will contribute to Google's ability to drive revenues or provide an excellent customer experience. This serves as the context for structuring the project, the resources I allocate to it, and other parameters. In my experience, addressing the customer's needs effectively allows the entire organization to succeed and grow."

Written by William Swansen on November 11th, 2021

Operational

7. Describe to me how you balance the interests of different stakeholders here at Google.

How to Answer

This question may appear to be similar to one that the Google interviewer has already asked. It is common to be asked several questions about the same topic in an interview. One reason for this is that the interviewer has a specific interest in this area and wants to explore it in great detail. Another reason is to ensure that you are being consistent throughout the interview with Google. This should not be an issue as long as you answer the questions honestly. Keeping your answers brief and to the point will also help you to be consistent throughout the interview.

Written by William Swansen on November 11th, 2021

Answer Example

"A large part of my job is mediating between two or more teams within the organization. Most of my work involves coordinating the output of the DevOps team with the needs and availability of the operations team. I often find that these groups have conflicting objectives. I've learned how to mediate between them by listening to both sides, understanding their issues and challenges, determining what their individual and mutual objectives are, and negotiating compromises that will help Google attain the best outcome possible."

Written by William Swansen on November 11th, 2021

Operational

8. Tell me about some of the process improvements you have implemented in the past.

How to Answer

Organizations like Google hire site reliability engineers to save the company money and time by improving the systems and processes used to develop and implement software applications. During an interview with Google, you will be asked how you accomplished this in the past. When preparing for the interview, you should have some very specific examples of process improvements you've implemented in your previous roles. You can describe these using the STAR framework. Describe the Situation, talk about the Task you were trying to complete, discuss the Actions you took, and finish with the results you attained.

Written by William Swansen on November 11th, 2021

Answer Example

"In my last job, it became apparent that the time to implement a software application exceeded the organization's expectations. I was assigned to analyze and reduce the start-up time for a new app. I examined the workflow and found that there was a lot of interchange between the DevOps Group and the operations team when a new app was released. I determined that this was due to a lack of proper documentation concerning the app from the DevOps team. I collaborated with both groups to determine what information was needed to expedite the implementation of a new piece of software. We then established metrics around this information. Once the new system was implemented, the start-up time for a new application was reduced by 50%."

Written by William Swansen on November 11th, 2021

Operational

9. How do you establish SLOs and SLIs, and are you open to making adjustments to these when warranted?

How to Answer

Service level objectives and service level indicators are the keystones of the work SREs perform. Interviewers at Google will be interested to learn how you go about establishing these and whether you are open to changes once the project is underway. They're looking for a balance between commitment to the metrics used by SREs as well as the flexibility to change when circumstances demand it. Your answer should demonstrate a systematic approach to establishing the criteria used to measure the success of a project while also showing that you can adjust projects as they move forward to optimize them.

Written by William Swansen on November 11th, 2021

Answer Example

"When scoping and planning a new project here at Google, the first thing I would do is understand the service level agreements project stakeholders are looking for. This helps me establish the service level objectives which will result in the SLA being met. Once I have SLOs set up, it is relatively easy to build a list of service level indicators that will tell the Google project stakeholders how the project is progressing and if the SLOs and SLI days are being met. I've found that a top-down approach works best. I'm also open to adjusting the SLOs and SLIs if it is determined that they are either too liberal or too strict and will not result in a predetermined SLA."

Written by William Swansen on November 11th, 2021

Operational

10. Walk me through the process of determining if a Google development team should work on new features or pay down technical debt.

How to Answer

Site reliability engineers need to ensure that DevOps works towards improvements to the applications, resulting in Google achieving its business objectives. You're constantly asked to choose between adding new features or solidifying and improving the ones already developed. One way these conflicts can be resolved is through objective analysis and defined metrics. However, the most talented SREs can also use subjective judgment when choosing between developing new features or paying down technical debt. Interviewers at Google are looking for candidates who demonstrate skills in both of these methodologies.

Written by William Swansen on November 11th, 2021

Answer Example

"When asked to choose between developing new features or paying down technical debt, my first approach is to look at the metrics and see which activities will provide the greater return on investment. This objective analysis is easy to complete when the right metrics are available. However, my experience has taught me that I can also look at this issue subjectively to determine which effort would generate the best results, lead to future developments, and keep the DevOps team engaged. I try to balance the objective and subjective analysis to determine the right course of action."

Written by William Swansen on November 11th, 2021

Operational

11. Can you describe the concept of observability? How would you improve Google's systems observability?

How to Answer

Observability is a key concept used by site reliability engineers to improve the operations of the organization. It involves measuring different aspects of the organization's operations and making recommendations to improve these. Observability requires specific metrics, the ability to note and measure them, and using the data to increase the organization's efficiency. It is critical for a site reliability engineer to improve the observability of an organization's operation. The Google interviewer wants to understand how you would do this.

Written by William Swansen on November 11th, 2021

Answer Example

"Observability involves defining, collecting, and analyzing the metrics that an organization needs to quantify and improve their operations. The key to this process is selecting the right measurements that will provide the information Google needs to analyze and optimize each of its processes. The key to improving an organization's systems observability involves selecting the correct metrics, creating systems to collect and analyze the data, and employing the results to improve the processes. This requires commitment from everyone within Google to collect the information and apply the results."

Written by William Swansen on November 11th, 2021

Operational

12. How would you describe cloud computing to someone here at Google who doesn't have a technical background?

How to Answer

When an interviewer at Google asks this question, they are less interested in your understanding of cloud computing and more interested in your communication skills. As a site reliability engineer, you will need to communicate with key stakeholders across Google. Many of these stakeholders will not have a technical background, so you will need to use non-technical, easy-to-understand language. When answering this question, you should avoid complex phrases, jargon, or other terminology that the interviewer may not know.

Written by William Swansen on November 11th, 2021

Answer Example

"Cloud computing is very similar to traditional onsite computing, which you may be familiar with. The difference is that the computing resources, including hardware and software applications, reside at a different location. These may be owned by the organization or be provided as services by a third party such as Amazon, Microsoft, or Google. Computing assets dedicated to Google are known as a private cloud. Resources shared by several organizations are called a public cloud. Organizations may utilize both of these along with their traditional in-house technology infrastructure. You can also purchase computing resources as services, such as software as a service, infrastructure as a service, or simple processing as a service."

Written by William Swansen on November 11th, 2021

Operational

13. Can you explain how Service Level Objectives, or SLOs, are used in the work of a site reliability engineer?

How to Answer

Service-level objectives, or SLOs, are one of the fundamental tenants of the site reliability engineering profession. Therefore, you should have a great deal of knowledge about these and be able to discuss them at length. The interviewer at Google knows this and is looking for a direct answer from you. Your response should provide an overview of SLOs and a brief description of how they are used. The interviewer will ask you a follow-up question if they need more information.

Written by William Swansen on November 11th, 2021

Answer Example

"The service-level objective, or SLO, is a metric agreed on by the service provider and their client concerning what objective the project will attain. This is part of the service level agreement, also known as an SLA. Common criteria used to establish the SLO includes response time, throughput, frequency, and other service delivery metrics. I insist on very specific SLOs when initially scoping and planning a project."

Written by William Swansen on November 11th, 2021

Operational

14. What are some of the common data structures you work with in this role?

How to Answer

As a site reliability engineer, you are expected to have in-depth knowledge about databases and data structures. The Google interviewer will ask you a technical question like this to explore your knowledge and determine if you're qualified for this role. Since you work with data structures daily in this job, you should be able to easily answer this question.

Written by William Swansen on November 11th, 2021

Answer Example

"I work with a wide variety of data structures in my role. These can be grouped into four main categories: linear, tree, graphs, or hash structures. Examples include arrays, lists, heaps, decision, and hash trees."

Written by William Swansen on November 11th, 2021

Operational

15. What are some of the steps you can take to reduce toil in a process?

How to Answer

One of the major responsibilities of a site reliability engineer is to reduce toil in processes. Toil is defined as the amount of manual work or effort required by individuals on a specific project or process. By reducing toil, site reliability engineers increase the efficiency of a project while preserving human resources to address issues or processes that cannot be automated or standardized. Your answer to the Google interviewer's question should include the main ways you accomplish this.

Written by William Swansen on November 11th, 2021

Answer Example

"You can reduce toil within a process by creating internal or external automation or minimizing the amount of maintenance and with other interventions required by the process. By making the process more automated and requiring fewer interventions the amount of toil necessary for that process is lowered."

Written by William Swansen on November 11th, 2021

Opertaional

16. What steps have you taken to improve collaboration between operations and IT teams?

How to Answer

Site reliability engineers collaborate with teams from across the organization. Your key contacts are in DevOps or software development teams and the operations group. To be effective in this role, you need to work with organizations within Google that have conflicting objectives. Your ability to negotiate, persuade, and compromise is necessary to move the organization's objectives forward. The Google interviewer will ask you several questions to determine if you can work effectively with people outside your organization.

Written by William Swansen on November 11th, 2021

Answer Example

"I recognize that once hired by Google I will be working with individuals and teams from outside of my organization and with whom I have no direct authority. I have developed and refined my communication skills so that I can collaborate with these individuals to achieve common objectives. The skill most useful in this is active listening. I will take time to hear what the other Google stakeholders are saying, learn their concerns, and understand what they're trying to accomplish. This provides me with the information I need to promote my ideas, compromise when necessary, and negotiate agreements that move the Google stakeholders in the same direction."

Written by William Swansen on November 11th, 2021

Technical

17. What is Transmission Control Protocol, or TCP, and can you list some of the TCP connection states?

How to Answer

The job of a site reliability engineer spans the functions of Google's DevOps and IT infrastructure groups. One of the elements both of these teams needs to address is the network topography. Knowing about TCP and its connection states will demonstrate your qualifications for this role and may even set you apart from other candidates. Since this is a technical question, keep your answer brief and to the point.

Written by William Swansen on November 11th, 2021

Answer Example

"Transmission Control Protocol, or TCP, is part of the Internet protocol suite. It is often referred to as TCP/IP. TCP/IP controls the transmission of data across the network, ensuring that the right information is exchanged between the network nodes. Common TCP connection states include: LISTEN- when the server is listening for traffic, SYNC-SENT- after a request is sent and the servicer is waiting for a response, SYN-RECEIVED- when the servicer is waiting for a response to an ACK signal and ESTABLISHED- which indicates that a three-way TCP connection has finished."

Written by William Swansen on November 11th, 2021

Technical

18. Can you describe the differences between DevOps and Site Reliability Engineering?

How to Answer

Site reliability engineering involves managing the relationship between the DevOps and IT infrastructure departments within Google. It incorporates best practices between these disciplines, resulting in reliable software systems, and can scale when needed. The Google interviewer will ask you a question about the difference between DevOps and site reliability engineering to understand how you perceive this. They're hoping you'll talk about the benefits you bring to this role.

Written by William Swansen on November 11th, 2021

Answer Example

"Site reliability engineering focuses on merging the best practices of the DevOps group and the needs of Google's IT organization. DevOps focuses on creating software applications, whereas the IT infrastructure group is responsible for implementing the applications. The Google site reliability engineer addresses issues, including reducing organizational silos created by these two teams and leveraging technology and automation to improve operations. We accomplish this by measuring everything and identifying opportunities for process improvement."

Written by William Swansen on November 11th, 2021

Technical

19. Tell me about the differences between process and thread in the context of site reliability engineering.

How to Answer

As with any profession, site reliability engineering has its own terminology, phrases, and jargon. This question is an example of how common terms may mean something specific in the context of a particular job. When asked by the Google hiring manager to compare two terms used within your profession, you should first define each of them, and then discuss the differences or similarities. Be prepared for a follow-up question.

Written by William Swansen on November 11th, 2021

Answer Example

"In the context of site reliability engineering, a process is defined as a program that executes specific actions. In this same context, a thread is one of the segments of a process. Typically, threads are developed and then combined to form a process. Threads are lighter weight and take less time to execute than the entire process. The final difference is that a process does not share data with other processes. However, threads within the process do share data."

Written by William Swansen on November 11th, 2021

Technical

20. What is an error budget, and how is it used?

How to Answer

You should immediately recognize that this is a technical question since it asks about a term used in this profession. This particular term is unique and probably only used in the context of a site reliability engineer's work. This may create a challenge when answering this question. You need to describe the unique concepts using non-technical, easy-to-understand language. Your answer will demonstrate your communication skills to the Google interviewer.

Written by William Swansen on November 11th, 2021

Answer Example

"Error budget is an allowance built into a service level agreement to account for downtime and failures which impact the systems' uptime. This provides the technical support organization with a buffer to accommodate for unplanned but anticipated outages. Site reliability engineers here at Google prepare error budget policies that define the tradeoffs between the reliability of a project and the risks Google is willing to take to save money or time."

Written by William Swansen on November 11th, 2021

Technical

21. How would you define a service level indicator?

How to Answer

Sight reliability engineers deal with many issues related to service levels. The Google interviewer will expect you to be knowledgeable about each of these and be able to both define the term and discuss its use. As with any technical question, keep your answer brief and to the point. You should also be prepared for follow-up questions that the interviewer will use to explore the topic more deeply.

Written by William Swansen on November 11th, 2021

Answer Example

"A service-level indicator, also known as an SLI, is used to measure the level of service provided by the Google support team to their customer. SLIs are used to measure compliance with the SLOs, which defines how well the team is meeting the SLA. Common SLIs include throughput, availability, latency, and error rates."

Written by William Swansen on November 11th, 2021

Technical

22. What is a Linux signal, and what are some common ones you work with?

How to Answer

As a prospective Google site reliability engineer, you are expected to have a working knowledge of some of the more common operating systems. One of these is Linux. While you don't need to be proficient in writing Linux code, you should be familiar with some key commands. If you're asked about a command that you're not familiar with, readily admit this, and then describe how you would go about obtaining the information.

Written by William Swansen on November 11th, 2021

Answer Example

"Some of the more common Linux signals used in my role as a site reliability engineer include SIGKILL, SIGHUP, SIGALRM, SIGQUIT, SIGFPE, SIGINT, and SIGTERM."

Written by William Swansen on November 11th, 2021

Technical

23. What are some of the common Linux kill commands?

How to Answer

As a site reliability engineer, you should be familiar with all the major operating systems, including Windows, macOS, and Linux, a popular version of Unix. If you use any of these operating systems frequently, you should also be familiar with some key commands. This will help you interpret the code created by the Google DevOps team and understand how it is used in production.

Written by William Swansen on November 11th, 2021

Answer Example

"There are several commands you can use to kill or stop Linux processes. The most common ones include Killall, Pkill, and xkill. Killall, as the name implies, will stop or kill all the processes with a specific name. Pkill is similar to killall. However, pkill will end processes with only the partial name specified by the programmer. Xkill is a special command which allows users to stop a process by clicking on the window in which it is running."

Written by William Swansen on November 11th, 2021

Technical

24. What are the fundamental stages of DevOps, and what tools do you use for each of these?

How to Answer

Even though you don't work directly for DevOps in your role as a site reliability engineer, you interface with them daily and therefore need to be familiar with their process and the tools they use. The Google interviewer will ask you this question to confirm your knowledge and determine if your perception of the process and tools is similar to the ones they use in their organization. You can align your answer with the processes of the interviewing organization by carefully researching their operations. Sources for this include Google's website, the job description, and both current and former Google employees.

Written by William Swansen on November 11th, 2021

Answer Example

"The typical stages for a DevOps project include planning, programming, verifying, packaging, and configuring. Some of the tools an SRE uses include Pivotal Tracker and other task management tools during the planning stage, GitHub and other source control tools when programming, CI/CD tools like Jenkins to verify, and packaging tools such as Kubernetes and Terraform for configuration. Based on my research, your organization here at Google uses similar tools."

Written by William Swansen on November 11th, 2021

Technical

25. What is a docker container, and how do you secure these?

How to Answer

You will likely be asked about a topic you are not familiar with during an interview with Google. Even though docker containers are commonly used in information technology infrastructures, you may not have come across them in your career. If that is the case for this question or any other topic, the best strategy is to admit that you have not used the technology or are unfamiliar with it. Then describe to the Google interviewer how you would locate the information or learn about the topic. This will help you establish your credibility with the interviewer and demonstrate your willingness to learn new technologies once hired by Google.

Written by William Swansen on November 11th, 2021

Answer Example

"While I have never used docker containers, I have heard of them. I believe a docker container is a platform as a service, or PaaS, that uses containers with virtualized operating systems, software libraries, and other files to deliver off software services. I'm not familiar with how you would secure these containers, but I'm sure I can quickly learn this by accessing some of the typical information resources I use in my work. These include Wikipedia, technical blogs, and information provided by technology vendors. One of my favorite resources is Github."

Written by William Swansen on November 11th, 2021

Technical

26. Please discuss hard links and soft links and provide an example of each command.

How to Answer

As the interview progresses, the interviewer at Google will continue to ask you technical questions. The difficulty of these will vary depending on where you are in the interview, your responses to previous questions, and the nature of the interviewer. You're more likely to be asked a more challenging question towards the end of the interview as the interviewer gains confidence in your capabilities. Continue to respond to these questions directly and succinctly, and anticipate follow-up questions.

Written by William Swansen on November 11th, 2021

Answer Example

"A computer's software uses both hard links and soft links to help you locate files across Google's IT infrastructure. They create connections between file systems and allow you to cross the systems locate files that are not part of your storage structure. A hard link is a copy of the original file. It is more structured than a soft link in that once created, the files cannot cross system boundaries, directories can't be linked, and the inode number is the same as the original file. A soft link is a link to the file that allows you to cross-file systems and link directories. The inode for a soft link is different than that for the original file. Commands for both of these are similar: $ novel hardlink.file and $ novel softlink.file."

Written by William Swansen on November 11th, 2021

Technical

27. Can you discuss the difference between snat and dnat?

How to Answer

Often during an interview with Google, the interviewer will ask you to compare two terms used in this profession. This is considered a technical question and requires you to be aware of the technology to respond correctly. When answering technical questions, keep your response brief and to the point. The Google interviewer will ask you a follow-up question if they need additional information or want to explore the topic in more detail.

Written by William Swansen on November 11th, 2021

Answer Example

"Both snat and dnat are processes used to route data across the internet. Snat, also known as Source Network Address Translation, allows traffic from a private network to access the internet. Dnat, which stands for Destination Network Address Translation, masks or changes the destination IP address of a data transmission packet and does the same for any responses from the destination to the original node. Routers located between the endpoints perform these changes. The difference is that snat is associated with outbound communications for the Internet, whereas dnat involves inbound communications to private networks. Multiple nodes accessing an internet resource can use the same snat. However, common Internet Resources need to create a specific dnat for each node they are communicating with."

Written by William Swansen on November 11th, 2021

Technical

28. What is Dynamic Host Configuration Protocol (DHCP), and what is it used for?

How to Answer

Dynamic Host Configuration Protocol, also known as DHCP, is a network protocol used to locate websites that computer users are trying to reach. As a site reliability engineer at Google, you may not work directly with DHCP, but you should know what it is and be able to define it for the interviewer. The more familiar you are with the technology you will be engaging with, the better qualified you will be for this role and the more likely you will be to be hired byGoogle.

Written by William Swansen on November 11th, 2021

Answer Example

"Dynamic Host Configuration Protocol is a network management procedure used on the internet and local networks to find a website, computer, user, or another asset a computer user is looking for. The address of the asset is known as the Internet Protocol or IP address. A DHCP server dynamically assigns an IP address and other network configuration parameters to each device on the network. This allows them to communicate with each other as well as with other devices, nodes, or users on the network. You could think of DHCP as the computer's address book."

Written by William Swansen on November 11th, 2021

Technical

29. Can you define the term 'inode'?

How to Answer

When interviewing for a site reliability job at Google, you may be asked a broad range of technical questions. These can involve operating systems, programming languages, networking protocols, and other items you may encounter in this role. While it is impossible to be familiar with all of these, you should try to be up to date on the ones you typically use and the technologies Google currently employs. Reviewing these technologies before the interview will enable you to answer the interviewer's questions quickly and proficiently.

Written by William Swansen on November 11th, 2021

Answer Example

"The term inode refers to a data structure used in Unix. Inode contains metadata about the file it references. Some of the information an inode provides includes the size of the file, its owner, the mode, and time stamps including atime, ctime, and mtime."

Written by William Swansen on November 11th, 2021

Technical

30. In your opinion, what are some of the key functions performed by an ideal DevOps team?

How to Answer

You may assume that the key function provided by the DevOps team is to develop applications and other types of software. By asking this question, the Google interviewer is trying to understand some of the other functions you deem critical from your perspective as a site reliability engineer. These other functions will help you manage the interaction between the DevOps team, the IT infrastructure group, and the rest of the organization. The interviewer will use your answer to determine if you can function within Google's environment effectively.

Written by William Swansen on November 11th, 2021

Answer Example

"The most obvious function performed by the DevOps team is to create applications and other software used in the operation of Google's IT infrastructure. However, there are several other actions they need to perform to be effective. These include constant communication with the information technology organization and collaboration with the functional departments of Google to understand their needs and develop applications that will help them attain their business objectives. A DevOps team must also act as a consultant to the organization, recommending changes, upgrades, and alternative solutions to the company's technology challenges."

Written by William Swansen on November 11th, 2021