25 scenario-based Docker interview questions

Top 25 scenario-based Docker interview questions and answers

Are you a DevOps professional in Kolkata preparing for your next big career move? Or perhaps you’re an interviewer in the bustling IT hubs of Salt Lake, New Town, or Rajarhat seeking to assess true Docker expertise? In today’s competitive tech landscape, especially in Kolkata’s growing IT sector, mastering Docker and Docker Swarm has become non-negotiable for DevOps roles. With the city’s IT industry expanding rapidly and companies from banking to healthcare adopting containerization, the demand for skilled Docker professionals has never been higher.

This comprehensive guide presents 25 scenario-based Docker interview questions specifically tailored for candidates with 5+ years of experience. Whether you’re interviewing at Kolkata-based tech giants, thriving startups, or multinational corporations with development centers in the city, these questions reflect real-world challenges you’ll encounter in production environments. From troubleshooting Swarm cluster issues to optimizing container performance for Kolkata’s unique infrastructure constraints, we’ve crafted questions that test not just theoretical knowledge but practical problem-solving abilities.

What sets this guide apart is its focus on scenario-based questioning – the gold standard in modern technical interviews. Instead of simple definition-based questions, we dive into complex situations that mirror actual production challenges faced by Kolkata’s IT companies. Each question is paired with a detailed, paragraph-wise answer that demonstrates the depth of understanding expected from senior DevOps professionals.

As Kolkata continues to establish itself as a significant IT hub in Eastern India, with special economic zones and tech parks driving innovation, the need for containerization expertise grows proportionally. Companies are moving beyond basic Docker usage to implementing sophisticated container orchestration solutions, making Swarm skills particularly valuable. This guide will help you prepare for the exact type of questions Kolkata’s top tech employers are asking today.

Table of Contents

Core Docker & Swarm Scenarios

1. Scenario: You have a production Docker application that suddenly starts consuming 100% CPU on the host. How would you diagnose and troubleshoot this issue in a live Swarm cluster without causing downtime?

Answer:
First, I would use docker stats on the affected node to identify the specific container consuming CPU. Then, I’d exec into the container (docker exec -it <container> sh) and use top or htop inside to see which process is causing the spike. Simultaneously, I’d check the node’s resource usage via docker node ps and docker node inspect to see if the service is overloaded. To avoid downtime, I would scale the service (docker service scale <service>=<replicas+1>) to distribute the load, then gradually remove the problematic replica after gathering logs (docker logs --tail 100 <container>). If it’s a known bug, I’d roll back the service to a previous image using docker service rollback. For deeper profiling, I might use docker exec <container> perf top or export container metrics to Prometheus/Grafana for historical analysis.


2. Scenario: Your Swarm manager node goes down unexpectedly. How do you recover the cluster without losing existing services?

Answer:
Swarm uses Raft consensus, so if one manager goes down, the other managers (if you have 3 or 5) will maintain quorum. First, I’d check the remaining managers with docker node ls to confirm leader status. To recover the failed manager, I’d either restart the Docker daemon on that node and rejoin it with docker swarm join --token <manager-token>, or if the node is permanently lost, I’d demote it (docker node demote <node>) and remove it (docker node rm). If it was the sole manager, I’d need to initialize a new Swarm with docker swarm init --force-new-cluster on a healthy worker node (this recovers the Raft data from backups). All services are stored in the Swarm’s Raft logs, so they’ll persist after recovery. I’d then add new managers for high availability.


3. Scenario: A developer reports that their containerized app runs locally but fails in Swarm with “connection refused” errors between services. What would you check?

Answer:
I’d first verify that the services are on the same overlay network (docker network ls and docker network inspect). In Swarm, inter-service communication requires a user-defined overlay network (not the default bridge). Next, I’d check service discovery: Swarm uses DNS round-robin for service names; I’d exec into a container and run nslookup <service_name> to see if endpoints resolve. If the app uses hardcoded IPs or ports instead of service names, that would fail. I’d also verify the published ports are correct (docker service inspect). Firewall rules on the nodes might block overlay traffic (VXLAN port 4789) or ingress mesh routing (port 7946). Lastly, I’d check if the service is running on multiple nodes but the other nodes have resource constraints preventing container placement.


4. Scenario: You need to securely pass secrets (like API keys) to a service in Swarm, but the secrets must be rotated weekly without restarting all containers.

Answer:
I’d use Docker Swarm’s built-in secrets management. Secrets are mounted as files in /run/secrets/ inside containers. To rotate without full restart, I’d create a new secret (docker secret create new_key /path/to/key). Then, I’d update the service to use the new secret: docker service update --secret-rm old_key --secret-add source=new_key,target=api_key <service>. This triggers a rolling update (zero-downtime) where tasks are replaced with new ones that have the new secret. The old secret remains until no service uses it, then I’d remove it. For applications that need to reload secrets without restart, I’d design the app to watch the secret file for changes (e.g., using inotify) or send a SIGHUP signal.


5. Scenario: During a deployment, a service update in Swarm is stuck in “ROLLBACK” state. How do you resolve this?

Answer:
A rollback usually happens when the new image fails health checks or crashes. I’d first check docker service ps <service> --no-trunc to see the error messages. Common causes: missing environment variables, incorrect image tag, or resource constraints. I’d inspect the failed task’s logs (docker logs <task_id>). If I need to override the rollback, I could force an update with a corrected configuration: docker service update --image correct:tag --force <service>. To prevent automatic rollbacks, I’d adjust the --update-failure-action flag during updates. If the rollback is due to slow image pulls, I’d ensure all nodes have the image pre-pulled or use a private registry with low latency.


Docker Performance & Optimization

6. Scenario: You notice that container startup times in Swarm have increased significantly over time. What are the potential causes and solutions?

Answer:
Slow container startups can be due to several factors:

  1. Image size: Large images take longer to pull. Solution: Use multi-stage builds, slim base images, and layer caching.

  2. Registry latency: If nodes pull from a remote registry, use a local mirror or cache (e.g., Docker Registry, Nexus).

  3. Docker daemon performance: High disk I/O on /var/lib/docker. Solution: Use SSD, overlay2 storage driver, and prune unused images/volumes regularly.

  4. Networks: Many unused overlay networks can slow down network creation. Remove unused networks.

  5. Node resource exhaustion: Check CPU/memory on nodes (docker system df).

  6. Too many volumes: Volume creation can be slow. Use volume drivers optimized for your storage backend.
    I’d profile using docker events to see where time is spent during container creation.


7. Scenario: A Swarm service with 10 replicas is experiencing uneven load distribution; some containers get most requests while others are idle.

Answer:
Swarm’s built-in load balancing uses IPVS in routing mesh mode for published ports. Uneven load could mean:

  1. DNS caching: Some clients cache the DNS resolution of the service name. Swarm’s DNS round-robin may be bypassed.

  2. Session persistence: If the app uses sticky sessions and Swarm’s load balancer isn’t configured for it, requests might not be evenly distributed.

  3. Ingress mode vs. host mode: For uneven traffic, I’d check if the service uses --mode=global (one per node) vs --mode=replicated.
    Solution: Use an external load balancer (HAProxy, nginx) in front of Swarm, or configure Swarm’s routing mesh with --endpoint-mode dnsrr (DNS round-robin) and let an external LB handle distribution. Also, check network connectivity between nodes; if overlay network is flaky, traffic might not route properly.


8. Scenario: Your Docker hosts are running out of disk space frequently. What cleanup strategies would you implement?

Answer:
I’d set up a monitoring alert for disk usage (using docker system df). Cleanup steps:

  1. Automated pruning: Run docker system prune -a -f on a cron job during off-hours, but be cautious as it removes all unused images, containers, networks, and build cache.

  2. Log rotation: Configure Docker daemon with json-file log driver and rotation limits (max-sizemax-file).

  3. Volume management: Remove unused volumes with docker volume prune.

  4. Image garbage collection: Implement a policy to remove old tags from the registry (e.g., using Registry GC).

  5. Storage driver: Use overlay2 (if not already) and ensure thin provisioning.
    For production, I’d allocate a separate partition for /var/lib/docker and monitor growth trends. Also, use CI/CD to keep image sizes minimal.


9. Scenario: You need to optimize a Dockerfile for a Java application to reduce build time and image size.

Answer:
For a Java app:

  1. Multi-stage build: Use a Maven base image for building and a JRE-only base for runtime.

  2. Layer caching: Order Dockerfile steps from least to most frequent changes. Copy pom.xml first, run mvn dependency:go-offline to cache dependencies before copying source code.

  3. Use slim images: e.g., openjdk:11-jre-slim.

  4. Remove unnecessary files: Clean Maven target directory after build, remove apt cache.

  5. Use .dockerignore to avoid copying unwanted files.
    Example Dockerfile:

dockerfile
FROM maven:3.8-openjdk-11 AS build
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests

FROM openjdk:11-jre-slim
COPY --from=build /app/target/*.jar app.jar
ENTRYPOINT ["java","-jar","app.jar"]

10. Scenario: In Swarm, a service with 50 replicas needs a configuration update. How do you perform a zero-downtime update with health checks?

Answer:
I’d use docker service update with these parameters:

bash
docker service update \
  --image new:tag \
  --update-parallelism 5 \
  --update-delay 10s \
  --update-order start-first \
  --update-failure-action pause \
  --health-cmd "curl -f http://localhost:8080/health" \
  --health-interval 5s \
  --health-retries 3 \
  <service>

This updates 5 replicas at a time, waits 10 seconds between batches, starts new tasks before stopping old ones (start-first), and pauses if health checks fail. I’d verify with docker service ps <service> to watch the rolling update. If issues arise, I can rollback with docker service rollback <service>.


DevOps Master Class Training in Kolkata
DevOps Master Class Training in Kolkata

Docker Networking & Security

11. Scenario: Two services in the same overlay network cannot ping each other by container name, but can by IP. Diagnose the issue.

Answer:
DNS resolution failure in overlay networks can be due to:

  1. Swarm’s internal DNS resolver issues: Check if the tasks are on the same network (docker service inspect).

  2. Container’s resolv.conf: Exec into a container and check /etc/resolv.conf—it should point to 127.0.0.11 (Docker’s DNS). If overridden, DNS won’t work.

  3. Network MTU issues: Overlay networks have a default MTU; if the physical network has a lower MTU, packets may fragment and cause issues. Adjust with docker network create --opt com.docker.network.driver.mtu=1450.

  4. Firewall blocking UDP 53 (DNS) or TCP 53 between nodes.
    I’d also check if there are multiple networks attached to the service, which can cause routing confusion.


12. Scenario: You need to isolate a Swarm service so it can only communicate with a specific backend service, not others.

Answer:
I’d use Docker’s network segmentation:

  1. Create a separate overlay network: docker network create --driver overlay --subnet 10.1.0.0/24 secure-net.

  2. Attach only the two services to this network: docker service update --network-add secure-net <service1>, same for service2.

  3. Remove them from other networks if they’re attached to multiple.

  4. Additionally, use Swarm’s network-level access control by not publishing ports for the backend service; only allow internal communication. For finer control, I could use Docker’s --network-rm to ensure isolation. If needed, I’d also implement network policies via third-party tools (Calico, Weave) integrated with Swarm.


13. Scenario: Your Swarm cluster spans multiple AWS regions. How would you design the overlay network for cross-region communication?

Answer:
Docker’s native overlay network is not designed for high-latency cross-region links. Instead, I would:

  1. Avoid a single overlay network across regions due to latency and reliability issues.

  2. Use a service mesh like Linkerd or Istio to handle cross-region communication with retries and timeouts.

  3. Set up separate Swarm clusters per region and use an external discovery service (Consul, etcd) for service registration.

  4. Connect services via API gateways in each region, using regional DNS-based routing.

  5. If I must use Docker networking, I’d create overlay networks per region and use VPN or AWS VPC peering with proper routing tables, but I’d expect higher latency.


14. Scenario: A security scan reveals that your Docker images run as root. How do you remediate this in production without breaking the app?

Answer:
Running as root increases risk. Steps:

  1. Modify Dockerfile: Add USER <non-root-user>, create a user and group, and set appropriate permissions. Example:

dockerfile
FROM alpine
RUN addgroup -g 1000 app && adduser -u 1000 -G app -D app
USER app
  1. Test thoroughly: Some apps need root for privileged ports (<1024) or specific capabilities. Adjust by binding to ports >1024 or adding capabilities (--cap-add) in docker run.

  2. In Swarm: Use --user flag in service definition.

  3. Use read-only root filesystemdocker run --read-only. Mount tmpfs for writable areas.

  4. Apply security profiles: Use --security-opt no-new-privileges and drop unnecessary capabilities.
    I’d also use tools like docker-slim or clair to scan images for vulnerabilities.


15. Scenario: You suspect a container has been compromised. How would you investigate using Docker commands?

Answer:

  1. Isolate the container: Immediately stop it (docker stop) or disconnect it from networks.

  2. Forensic snapshot: Commit the container to an image (docker commit <container> forensic-image) for later analysis.

  3. Check processesdocker top <container> to see running processes.

  4. Inspect logsdocker logs --timestamps <container> for unusual activity.

  5. Examine changesdocker diff <container> to see modified/added files.

  6. Inspect network connections: Use nsenter to enter the container’s network namespace and run netstat -tulpn.

  7. Check other containers: See if the compromise spread via shared volumes or networks.

  8. Review Docker daemon logs: Look for unusual API calls.
    Post-investigation, I’d patch the vulnerability, rotate secrets, and consider using Docker Bench Security to harden the host.


Storage & Volumes

16. Scenario: Your stateful service (like a database) in Swarm needs persistent storage. How would you design it for high availability?

Answer:
For stateful services in Swarm:

  1. Use global mode with constraintsdocker service create --mode=global --constraint node.labels.storage==ssd ....

  2. Mount host volumes with specific paths: --mount type=bind,source=/data/db,destination=/var/lib/mysql. However, this ties the container to a specific node.

  3. Better approach: Use a distributed storage driver (e.g., rexray for AWS EBS, portworx, or glusterfs). Create a volume with docker volume create --driver rexray --opt size=100.

  4. Replicate data at app level: For databases like PostgreSQL, use streaming replication across multiple containers, each with its own persistent volume.

  5. Backup strategy: Regular snapshots of volumes.
    I’d avoid Swarm’s replication for stateful services; instead, use orchestration at the application level (e.g., Patroni for PostgreSQL).


17. Scenario: A container logs excessively, filling up the host’s disk. How would you limit and manage logs in production?

Answer:

  1. Docker log driver configuration: In /etc/docker/daemon.json, set:

json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Or use journald or syslog driver to offload logs.
2. For Swarm services: Use --log-driver and --log-opt in service creation.
3. Centralized logging: Send logs to ELK stack, Fluentd, or Splunk using the appropriate log driver (e.g., gelfsplunk).
4. Application-level log rotation: Configure the app to log at INFO level only, and use logrotate inside the container if necessary.
5. Monitor log growth: Use docker system df to check log usage.
6. Emergency cleanupdocker run --rm -v /var/lib/docker:/var/lib/docker alpine find /var/lib/docker/containers -name \"*.log\" -size +100M -delete.


18. Scenario: You need to share a configuration file across multiple services in Swarm, but it must be dynamically updatable without rebuilding images.

Answer:
I’d use Docker Configs for static configuration files. However, configs are immutable at runtime. For dynamic updates:

  1. Use an external config service like etcd, Consul, or Spring Cloud Config, and have services poll for changes.

  2. Alternatively, use a sidecar container (like consul-template) that watches the config and updates a shared volume.

  3. If using Docker Configs, update the config (docker config create new_config file) and then update the service to use the new config (docker service update --config-rm old --config-add source=new_config,target=/app/config.yaml <service>). This triggers a rolling restart.

  4. For frequent changes, consider environment variables (for small configs) or a dedicated configuration management tool.


Docker CI/CD & Deployment

19. Scenario: Your CI/CD pipeline builds a Docker image and deploys to Swarm. How would you implement blue-green deployment in Swarm?

Answer:
Swarm doesn’t have built-in blue-green, but we can simulate it:

  1. Two services: Create myapp_blue and myapp_green with the same overlay network.

  2. Use a router (like Traefik or HAProxy) as the entry point, configured to route traffic to one service.

  3. Deploy new version to the idle service (e.g., green).

  4. Test the green service internally via a canary endpoint.

  5. Switch traffic: Update the router to point to green.

  6. Rollback: If issues, revert router to blue.
    Alternatively, use Docker stacks: Deploy a new stack with updated image and switch the external load balancer’s target. I’d also use docker service scale to gradually shift traffic by adjusting replicas.


20. Scenario: You have a Docker Swarm cluster with mixed OS nodes (Linux and Windows). What challenges would you face, and how would you manage?

Answer:
Challenges:

  1. Image compatibility: Windows containers cannot run on Linux nodes and vice versa. Use node labels (node.labels.os=windows) and constraints in services.

  2. Network isolation: Overlay networks must be created with --attachable for cross-platform? Actually, Windows and Linux have separate network drivers. You’d need to use transparent or l2bridge network drivers on Windows.

  3. Volume drivers: Different storage backends for persistent volumes.

  4. Orchestration differences: Some Docker features may not be available on Windows.
    Solution: Use separate stacks or services for each OS, and label nodes appropriately. Use --constraint 'node.platform.os == windows' in service definitions. Ensure the Swarm manager runs on Linux for stability.


21. Scenario: A developer wants to debug a service running in Swarm by attaching a shell. How would you do it without exposing the container to the public?

Answer:

  1. Find the node where the container is running: docker service ps <service>.

  2. SSH into that node (internal network only).

  3. Get the container IDdocker ps | grep <service>.

  4. Exec into the containerdocker exec -it <container_id> /bin/sh.
    If SSH isn’t allowed, I’d create a secure debugging service:

  • Expose a SSH sidecar container in the same network (with a non-standard port) only for internal access.

  • Use Docker’s built-in docker service logs -f for log viewing.

  • For interactive debugging, consider docker run --rm -it --network container:<target_container> nicolaka/netshoot for network troubleshooting.


22. Scenario: You need to enforce resource limits (CPU/memory) for all services in Swarm to prevent a single service from overwhelming a node.

Answer:

  1. Set default resource limits in the Docker daemon configuration (/etc/docker/daemon.json): not directly available for Swarm services.

  2. Enforce via Swarm service creation: Always specify --limit-cpu and --limit-memory.

  3. Use a policy engine like Portainer or custom scripts to validate docker-compose.yml files before deployment.

  4. Monitor resource usage with docker stats and set alerts.

  5. Use node resource reservationsdocker service create --reserve-memory 512m.
    For existing services, update them with limits. Also, use Swarm’s --mode=global for system services to ensure one per node.


23. Scenario: During a node failure, Swarm reschedules containers, but the rescheduled containers fail due to missing volumes. How do you ensure volumes are available across nodes?

Answer:
Local host volumes (type=bind) are not available across nodes. Solutions:

  1. Use a distributed volume driver (e.g., rexrayportworx) that replicates storage across nodes.

  2. Use NFS/GlusterFS volumes mounted on all nodes, and use type=volume with NFS driver.

  3. Design for statelessness: Store data in external services (S3, managed database).

  4. If using host volumes, use constraints to ensure containers are scheduled only on nodes with the required data (not HA).
    In Swarm, define a volume with a global driver:

bash
docker volume create --driver local --opt type=nfs --opt o=addr=nfsserver,rw --opt device=:/path nfs_vol
docker service create --mount type=volume,source=nfs_vol,destination=/data ...

24. Scenario: You need to upgrade Docker Engine on all Swarm nodes from 20.x to 24.x with zero downtime.

Answer:
Upgrade procedure for Swarm:

  1. Drain manager nodes one by one: Start with a non-leader manager. docker node update --availability drain <node>.

  2. Upgrade Docker on that node, restart Docker daemon.

  3. Set the node back to activedocker node update --availability active <node>.

  4. Repeat for other managers, ensuring leader is upgraded last.

  5. For workers: Drain each worker, upgrade, reactivate.

  6. Monitor services during upgrade: Swarm will reschedule tasks to other nodes.

  7. Test Swarm functionality after each node upgrade (docker node lsdocker service ls).
    Important: Check for breaking changes between versions (e.g., deprecated flags, storage driver changes). Have a rollback plan (snapshot nodes).


25. Scenario: Your Swarm cluster needs to integrate with an external monitoring system (Prometheus). How would you expose Docker metrics?

Answer:

  1. Enable Docker daemon metrics: Configure /etc/docker/daemon.json with:

json
{
  "metrics-addr": "0.0.0.0:9323",
  "experimental": true
}

But this exposes on each node separately.
2. Better: Use cAdvisor as a global service in Swarm to collect container metrics from all nodes and expose Prometheus metrics.
3. Deploy Prometheus as a service in Swarm, scraping each node’s metrics endpoint or cAdvisor.
4. For Swarm service metrics, use the Docker Stats API via a custom exporter.
5. Set up node exporters for host-level metrics.
6. Use Docker labels to add custom metrics (e.g., --label com.docker.prometheus.scrape=true).
7. Secure the metrics endpoint with firewall rules or TLS.

Conclusion

Mastering Docker and Swarm is no longer just a nice-to-have skill—it’s essential for any serious DevOps professional in Kolkata’s competitive IT market. The scenario-based questions covered in this guide represent the level of expertise that Kolkata-based companies like ITC Infotech, TCS Kolkata, Capgemini, Cognizant, and numerous startups now expect from senior DevOps candidates. These questions go beyond basic certification knowledge to test your ability to solve real production problems, which is exactly what makes the difference between an average candidate and a standout professional.

Remember, Kolkata’s IT sector values practical, hands-on experience combined with theoretical knowledge. The city’s unique infrastructure challenges—from network considerations to hybrid cloud environments—require adaptable problem-solving skills. As you prepare for your interviews, focus on understanding the “why” behind each solution, not just the “what.” Practice explaining your thought process clearly, as communication skills are particularly valued in Kolkata’s collaborative work culture.

For those seeking Docker opportunities in Kolkata, we recommend:

  1. Hands-on practice with multi-node Swarm clusters

  2. Understanding specific infrastructure considerations

  3. Staying updated with the latest Docker features and best practices

  4. Networking with Kolkata’s active DevOps community through local meetups and tech events

Whether you’re preparing for an interview at a traditional IT company in Salt Lake Sector V or a modern tech startup in New Town, these Docker questions will help you demonstrate the practical expertise that employers value most. Remember that in Kolkata’s growing tech ecosystem, professionals who can bridge traditional infrastructure with modern containerization technologies are in especially high demand.

Best of luck with your Docker interviews in Kolkata! May your container orchestration skills be as robust as the city’s rich cultural heritage, and may your career journey in the City of Joy be as rewarding as a perfectly optimized Swarm cluster running in production.

Like to know more about DevOps Master Class in Kolkata ? Click here

Leave a Reply

Your email address will not be published. Required fields are marked *