Top 25 scenario-based Docker interview questions and answers
Are you a DevOps professional in Kolkata preparing for your next big career move? Or perhaps you’re an interviewer in the bustling IT hubs of Salt Lake, New Town, or Rajarhat seeking to assess true Docker expertise? In today’s competitive tech landscape, especially in Kolkata’s growing IT sector, mastering Docker and Docker Swarm has become non-negotiable for DevOps roles. With the city’s IT industry expanding rapidly and companies from banking to healthcare adopting containerization, the demand for skilled Docker professionals has never been higher.
This comprehensive guide presents 25 scenario-based Docker interview questions specifically tailored for candidates with 5+ years of experience. Whether you’re interviewing at Kolkata-based tech giants, thriving startups, or multinational corporations with development centers in the city, these questions reflect real-world challenges you’ll encounter in production environments. From troubleshooting Swarm cluster issues to optimizing container performance for Kolkata’s unique infrastructure constraints, we’ve crafted questions that test not just theoretical knowledge but practical problem-solving abilities.
What sets this guide apart is its focus on scenario-based questioning – the gold standard in modern technical interviews. Instead of simple definition-based questions, we dive into complex situations that mirror actual production challenges faced by Kolkata’s IT companies. Each question is paired with a detailed, paragraph-wise answer that demonstrates the depth of understanding expected from senior DevOps professionals.
As Kolkata continues to establish itself as a significant IT hub in Eastern India, with special economic zones and tech parks driving innovation, the need for containerization expertise grows proportionally. Companies are moving beyond basic Docker usage to implementing sophisticated container orchestration solutions, making Swarm skills particularly valuable. This guide will help you prepare for the exact type of questions Kolkata’s top tech employers are asking today.
Core Docker & Swarm Scenarios
1. Scenario: You have a production Docker application that suddenly starts consuming 100% CPU on the host. How would you diagnose and troubleshoot this issue in a live Swarm cluster without causing downtime?
Answer:
First, I would use docker stats on the affected node to identify the specific container consuming CPU. Then, I’d exec into the container (docker exec -it <container> sh) and use top or htop inside to see which process is causing the spike. Simultaneously, I’d check the node’s resource usage via docker node ps and docker node inspect to see if the service is overloaded. To avoid downtime, I would scale the service (docker service scale <service>=<replicas+1>) to distribute the load, then gradually remove the problematic replica after gathering logs (docker logs --tail 100 <container>). If it’s a known bug, I’d roll back the service to a previous image using docker service rollback. For deeper profiling, I might use docker exec <container> perf top or export container metrics to Prometheus/Grafana for historical analysis.
2. Scenario: Your Swarm manager node goes down unexpectedly. How do you recover the cluster without losing existing services?
Answer:
Swarm uses Raft consensus, so if one manager goes down, the other managers (if you have 3 or 5) will maintain quorum. First, I’d check the remaining managers with docker node ls to confirm leader status. To recover the failed manager, I’d either restart the Docker daemon on that node and rejoin it with docker swarm join --token <manager-token>, or if the node is permanently lost, I’d demote it (docker node demote <node>) and remove it (docker node rm). If it was the sole manager, I’d need to initialize a new Swarm with docker swarm init --force-new-cluster on a healthy worker node (this recovers the Raft data from backups). All services are stored in the Swarm’s Raft logs, so they’ll persist after recovery. I’d then add new managers for high availability.
3. Scenario: A developer reports that their containerized app runs locally but fails in Swarm with “connection refused” errors between services. What would you check?
Answer:
I’d first verify that the services are on the same overlay network (docker network ls and docker network inspect). In Swarm, inter-service communication requires a user-defined overlay network (not the default bridge). Next, I’d check service discovery: Swarm uses DNS round-robin for service names; I’d exec into a container and run nslookup <service_name> to see if endpoints resolve. If the app uses hardcoded IPs or ports instead of service names, that would fail. I’d also verify the published ports are correct (docker service inspect). Firewall rules on the nodes might block overlay traffic (VXLAN port 4789) or ingress mesh routing (port 7946). Lastly, I’d check if the service is running on multiple nodes but the other nodes have resource constraints preventing container placement.
4. Scenario: You need to securely pass secrets (like API keys) to a service in Swarm, but the secrets must be rotated weekly without restarting all containers.
Answer:
I’d use Docker Swarm’s built-in secrets management. Secrets are mounted as files in /run/secrets/ inside containers. To rotate without full restart, I’d create a new secret (docker secret create new_key /path/to/key). Then, I’d update the service to use the new secret: docker service update --secret-rm old_key --secret-add source=new_key,target=api_key <service>. This triggers a rolling update (zero-downtime) where tasks are replaced with new ones that have the new secret. The old secret remains until no service uses it, then I’d remove it. For applications that need to reload secrets without restart, I’d design the app to watch the secret file for changes (e.g., using inotify) or send a SIGHUP signal.
5. Scenario: During a deployment, a service update in Swarm is stuck in “ROLLBACK” state. How do you resolve this?
Answer:
A rollback usually happens when the new image fails health checks or crashes. I’d first check docker service ps <service> --no-trunc to see the error messages. Common causes: missing environment variables, incorrect image tag, or resource constraints. I’d inspect the failed task’s logs (docker logs <task_id>). If I need to override the rollback, I could force an update with a corrected configuration: docker service update --image correct:tag --force <service>. To prevent automatic rollbacks, I’d adjust the --update-failure-action flag during updates. If the rollback is due to slow image pulls, I’d ensure all nodes have the image pre-pulled or use a private registry with low latency.
Docker Performance & Optimization
6. Scenario: You notice that container startup times in Swarm have increased significantly over time. What are the potential causes and solutions?
Answer:
Slow container startups can be due to several factors:
-
Image size: Large images take longer to pull. Solution: Use multi-stage builds, slim base images, and layer caching.
-
Registry latency: If nodes pull from a remote registry, use a local mirror or cache (e.g., Docker Registry, Nexus).
-
Docker daemon performance: High disk I/O on
/var/lib/docker. Solution: Use SSD, overlay2 storage driver, and prune unused images/volumes regularly. -
Networks: Many unused overlay networks can slow down network creation. Remove unused networks.
-
Node resource exhaustion: Check CPU/memory on nodes (
docker system df). -
Too many volumes: Volume creation can be slow. Use volume drivers optimized for your storage backend.
I’d profile usingdocker eventsto see where time is spent during container creation.
7. Scenario: A Swarm service with 10 replicas is experiencing uneven load distribution; some containers get most requests while others are idle.
Answer:
Swarm’s built-in load balancing uses IPVS in routing mesh mode for published ports. Uneven load could mean:
-
DNS caching: Some clients cache the DNS resolution of the service name. Swarm’s DNS round-robin may be bypassed.
-
Session persistence: If the app uses sticky sessions and Swarm’s load balancer isn’t configured for it, requests might not be evenly distributed.
-
Ingress mode vs. host mode: For uneven traffic, I’d check if the service uses
--mode=global(one per node) vs--mode=replicated.
Solution: Use an external load balancer (HAProxy, nginx) in front of Swarm, or configure Swarm’s routing mesh with--endpoint-mode dnsrr(DNS round-robin) and let an external LB handle distribution. Also, check network connectivity between nodes; if overlay network is flaky, traffic might not route properly.
8. Scenario: Your Docker hosts are running out of disk space frequently. What cleanup strategies would you implement?
Answer:
I’d set up a monitoring alert for disk usage (using docker system df). Cleanup steps:
-
Automated pruning: Run
docker system prune -a -fon a cron job during off-hours, but be cautious as it removes all unused images, containers, networks, and build cache. -
Log rotation: Configure Docker daemon with
json-filelog driver and rotation limits (max-size,max-file). -
Volume management: Remove unused volumes with
docker volume prune. -
Image garbage collection: Implement a policy to remove old tags from the registry (e.g., using Registry GC).
-
Storage driver: Use
overlay2(if not already) and ensure thin provisioning.
For production, I’d allocate a separate partition for/var/lib/dockerand monitor growth trends. Also, use CI/CD to keep image sizes minimal.
9. Scenario: You need to optimize a Dockerfile for a Java application to reduce build time and image size.
Answer:
For a Java app:
-
Multi-stage build: Use a Maven base image for building and a JRE-only base for runtime.
-
Layer caching: Order Dockerfile steps from least to most frequent changes. Copy pom.xml first, run
mvn dependency:go-offlineto cache dependencies before copying source code. -
Use slim images: e.g.,
openjdk:11-jre-slim. -
Remove unnecessary files: Clean Maven target directory after build, remove apt cache.
-
Use .dockerignore to avoid copying unwanted files.
Example Dockerfile:
FROM maven:3.8-openjdk-11 AS build WORKDIR /app COPY pom.xml . RUN mvn dependency:go-offline COPY src ./src RUN mvn package -DskipTests FROM openjdk:11-jre-slim COPY --from=build /app/target/*.jar app.jar ENTRYPOINT ["java","-jar","app.jar"]
10. Scenario: In Swarm, a service with 50 replicas needs a configuration update. How do you perform a zero-downtime update with health checks?
Answer:
I’d use docker service update with these parameters:
docker service update \ --image new:tag \ --update-parallelism 5 \ --update-delay 10s \ --update-order start-first \ --update-failure-action pause \ --health-cmd "curl -f http://localhost:8080/health" \ --health-interval 5s \ --health-retries 3 \ <service>
This updates 5 replicas at a time, waits 10 seconds between batches, starts new tasks before stopping old ones (start-first), and pauses if health checks fail. I’d verify with docker service ps <service> to watch the rolling update. If issues arise, I can rollback with docker service rollback <service>.

Docker Networking & Security
11. Scenario: Two services in the same overlay network cannot ping each other by container name, but can by IP. Diagnose the issue.
Answer:
DNS resolution failure in overlay networks can be due to:
-
Swarm’s internal DNS resolver issues: Check if the tasks are on the same network (
docker service inspect). -
Container’s resolv.conf: Exec into a container and check
/etc/resolv.conf—it should point to127.0.0.11(Docker’s DNS). If overridden, DNS won’t work. -
Network MTU issues: Overlay networks have a default MTU; if the physical network has a lower MTU, packets may fragment and cause issues. Adjust with
docker network create --opt com.docker.network.driver.mtu=1450. -
Firewall blocking UDP 53 (DNS) or TCP 53 between nodes.
I’d also check if there are multiple networks attached to the service, which can cause routing confusion.
12. Scenario: You need to isolate a Swarm service so it can only communicate with a specific backend service, not others.
Answer:
I’d use Docker’s network segmentation:
-
Create a separate overlay network:
docker network create --driver overlay --subnet 10.1.0.0/24 secure-net. -
Attach only the two services to this network:
docker service update --network-add secure-net <service1>, same for service2. -
Remove them from other networks if they’re attached to multiple.
-
Additionally, use Swarm’s network-level access control by not publishing ports for the backend service; only allow internal communication. For finer control, I could use Docker’s
--network-rmto ensure isolation. If needed, I’d also implement network policies via third-party tools (Calico, Weave) integrated with Swarm.
13. Scenario: Your Swarm cluster spans multiple AWS regions. How would you design the overlay network for cross-region communication?
Answer:
Docker’s native overlay network is not designed for high-latency cross-region links. Instead, I would:
-
Avoid a single overlay network across regions due to latency and reliability issues.
-
Use a service mesh like Linkerd or Istio to handle cross-region communication with retries and timeouts.
-
Set up separate Swarm clusters per region and use an external discovery service (Consul, etcd) for service registration.
-
Connect services via API gateways in each region, using regional DNS-based routing.
-
If I must use Docker networking, I’d create overlay networks per region and use VPN or AWS VPC peering with proper routing tables, but I’d expect higher latency.
14. Scenario: A security scan reveals that your Docker images run as root. How do you remediate this in production without breaking the app?
Answer:
Running as root increases risk. Steps:
-
Modify Dockerfile: Add
USER <non-root-user>, create a user and group, and set appropriate permissions. Example:
FROM alpine RUN addgroup -g 1000 app && adduser -u 1000 -G app -D app USER app
-
Test thoroughly: Some apps need root for privileged ports (<1024) or specific capabilities. Adjust by binding to ports >1024 or adding capabilities (
--cap-add) indocker run. -
In Swarm: Use
--userflag in service definition. -
Use read-only root filesystem:
docker run --read-only. Mount tmpfs for writable areas. -
Apply security profiles: Use
--security-opt no-new-privilegesand drop unnecessary capabilities.
I’d also use tools likedocker-slimorclairto scan images for vulnerabilities.
15. Scenario: You suspect a container has been compromised. How would you investigate using Docker commands?
Answer:
-
Isolate the container: Immediately stop it (
docker stop) or disconnect it from networks. -
Forensic snapshot: Commit the container to an image (
docker commit <container> forensic-image) for later analysis. -
Check processes:
docker top <container>to see running processes. -
Inspect logs:
docker logs --timestamps <container>for unusual activity. -
Examine changes:
docker diff <container>to see modified/added files. -
Inspect network connections: Use
nsenterto enter the container’s network namespace and runnetstat -tulpn. -
Check other containers: See if the compromise spread via shared volumes or networks.
-
Review Docker daemon logs: Look for unusual API calls.
Post-investigation, I’d patch the vulnerability, rotate secrets, and consider using Docker Bench Security to harden the host.
Storage & Volumes
16. Scenario: Your stateful service (like a database) in Swarm needs persistent storage. How would you design it for high availability?
Answer:
For stateful services in Swarm:
-
Use global mode with constraints:
docker service create --mode=global --constraint node.labels.storage==ssd .... -
Mount host volumes with specific paths:
--mount type=bind,source=/data/db,destination=/var/lib/mysql. However, this ties the container to a specific node. -
Better approach: Use a distributed storage driver (e.g.,
rexrayfor AWS EBS,portworx, orglusterfs). Create a volume withdocker volume create --driver rexray --opt size=100. -
Replicate data at app level: For databases like PostgreSQL, use streaming replication across multiple containers, each with its own persistent volume.
-
Backup strategy: Regular snapshots of volumes.
I’d avoid Swarm’s replication for stateful services; instead, use orchestration at the application level (e.g., Patroni for PostgreSQL).
17. Scenario: A container logs excessively, filling up the host’s disk. How would you limit and manage logs in production?
Answer:
-
Docker log driver configuration: In
/etc/docker/daemon.json, set:
{ "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" } }
Or use journald or syslog driver to offload logs.
2. For Swarm services: Use --log-driver and --log-opt in service creation.
3. Centralized logging: Send logs to ELK stack, Fluentd, or Splunk using the appropriate log driver (e.g., gelf, splunk).
4. Application-level log rotation: Configure the app to log at INFO level only, and use logrotate inside the container if necessary.
5. Monitor log growth: Use docker system df to check log usage.
6. Emergency cleanup: docker run --rm -v /var/lib/docker:/var/lib/docker alpine find /var/lib/docker/containers -name \"*.log\" -size +100M -delete.
18. Scenario: You need to share a configuration file across multiple services in Swarm, but it must be dynamically updatable without rebuilding images.
Answer:
I’d use Docker Configs for static configuration files. However, configs are immutable at runtime. For dynamic updates:
-
Use an external config service like etcd, Consul, or Spring Cloud Config, and have services poll for changes.
-
Alternatively, use a sidecar container (like
consul-template) that watches the config and updates a shared volume. -
If using Docker Configs, update the config (
docker config create new_config file) and then update the service to use the new config (docker service update --config-rm old --config-add source=new_config,target=/app/config.yaml <service>). This triggers a rolling restart. -
For frequent changes, consider environment variables (for small configs) or a dedicated configuration management tool.
Docker CI/CD & Deployment
19. Scenario: Your CI/CD pipeline builds a Docker image and deploys to Swarm. How would you implement blue-green deployment in Swarm?
Answer:
Swarm doesn’t have built-in blue-green, but we can simulate it:
-
Two services: Create
myapp_blueandmyapp_greenwith the same overlay network. -
Use a router (like Traefik or HAProxy) as the entry point, configured to route traffic to one service.
-
Deploy new version to the idle service (e.g., green).
-
Test the green service internally via a canary endpoint.
-
Switch traffic: Update the router to point to green.
-
Rollback: If issues, revert router to blue.
Alternatively, use Docker stacks: Deploy a new stack with updated image and switch the external load balancer’s target. I’d also usedocker service scaleto gradually shift traffic by adjusting replicas.
20. Scenario: You have a Docker Swarm cluster with mixed OS nodes (Linux and Windows). What challenges would you face, and how would you manage?
Answer:
Challenges:
-
Image compatibility: Windows containers cannot run on Linux nodes and vice versa. Use node labels (
node.labels.os=windows) and constraints in services. -
Network isolation: Overlay networks must be created with
--attachablefor cross-platform? Actually, Windows and Linux have separate network drivers. You’d need to usetransparentorl2bridgenetwork drivers on Windows. -
Volume drivers: Different storage backends for persistent volumes.
-
Orchestration differences: Some Docker features may not be available on Windows.
Solution: Use separate stacks or services for each OS, and label nodes appropriately. Use--constraint 'node.platform.os == windows'in service definitions. Ensure the Swarm manager runs on Linux for stability.
21. Scenario: A developer wants to debug a service running in Swarm by attaching a shell. How would you do it without exposing the container to the public?
Answer:
-
Find the node where the container is running:
docker service ps <service>. -
SSH into that node (internal network only).
-
Get the container ID:
docker ps | grep <service>. -
Exec into the container:
docker exec -it <container_id> /bin/sh.
If SSH isn’t allowed, I’d create a secure debugging service:
-
Expose a SSH sidecar container in the same network (with a non-standard port) only for internal access.
-
Use Docker’s built-in
docker service logs -ffor log viewing. -
For interactive debugging, consider
docker run --rm -it --network container:<target_container> nicolaka/netshootfor network troubleshooting.
22. Scenario: You need to enforce resource limits (CPU/memory) for all services in Swarm to prevent a single service from overwhelming a node.
Answer:
-
Set default resource limits in the Docker daemon configuration (
/etc/docker/daemon.json): not directly available for Swarm services. -
Enforce via Swarm service creation: Always specify
--limit-cpuand--limit-memory. -
Use a policy engine like Portainer or custom scripts to validate
docker-compose.ymlfiles before deployment. -
Monitor resource usage with
docker statsand set alerts. -
Use node resource reservations:
docker service create --reserve-memory 512m.
For existing services, update them with limits. Also, use Swarm’s--mode=globalfor system services to ensure one per node.
23. Scenario: During a node failure, Swarm reschedules containers, but the rescheduled containers fail due to missing volumes. How do you ensure volumes are available across nodes?
Answer:
Local host volumes (type=bind) are not available across nodes. Solutions:
-
Use a distributed volume driver (e.g.,
rexray,portworx) that replicates storage across nodes. -
Use NFS/GlusterFS volumes mounted on all nodes, and use
type=volumewith NFS driver. -
Design for statelessness: Store data in external services (S3, managed database).
-
If using host volumes, use constraints to ensure containers are scheduled only on nodes with the required data (not HA).
In Swarm, define a volume with a global driver:
docker volume create --driver local --opt type=nfs --opt o=addr=nfsserver,rw --opt device=:/path nfs_vol docker service create --mount type=volume,source=nfs_vol,destination=/data ...
24. Scenario: You need to upgrade Docker Engine on all Swarm nodes from 20.x to 24.x with zero downtime.
Answer:
Upgrade procedure for Swarm:
-
Drain manager nodes one by one: Start with a non-leader manager.
docker node update --availability drain <node>. -
Upgrade Docker on that node, restart Docker daemon.
-
Set the node back to active:
docker node update --availability active <node>. -
Repeat for other managers, ensuring leader is upgraded last.
-
For workers: Drain each worker, upgrade, reactivate.
-
Monitor services during upgrade: Swarm will reschedule tasks to other nodes.
-
Test Swarm functionality after each node upgrade (
docker node ls,docker service ls).
Important: Check for breaking changes between versions (e.g., deprecated flags, storage driver changes). Have a rollback plan (snapshot nodes).
25. Scenario: Your Swarm cluster needs to integrate with an external monitoring system (Prometheus). How would you expose Docker metrics?
Answer:
-
Enable Docker daemon metrics: Configure
/etc/docker/daemon.jsonwith:
{ "metrics-addr": "0.0.0.0:9323", "experimental": true }
But this exposes on each node separately.
2. Better: Use cAdvisor as a global service in Swarm to collect container metrics from all nodes and expose Prometheus metrics.
3. Deploy Prometheus as a service in Swarm, scraping each node’s metrics endpoint or cAdvisor.
4. For Swarm service metrics, use the Docker Stats API via a custom exporter.
5. Set up node exporters for host-level metrics.
6. Use Docker labels to add custom metrics (e.g., --label com.docker.prometheus.scrape=true).
7. Secure the metrics endpoint with firewall rules or TLS.
Conclusion
Mastering Docker and Swarm is no longer just a nice-to-have skill—it’s essential for any serious DevOps professional in Kolkata’s competitive IT market. The scenario-based questions covered in this guide represent the level of expertise that Kolkata-based companies like ITC Infotech, TCS Kolkata, Capgemini, Cognizant, and numerous startups now expect from senior DevOps candidates. These questions go beyond basic certification knowledge to test your ability to solve real production problems, which is exactly what makes the difference between an average candidate and a standout professional.
Remember, Kolkata’s IT sector values practical, hands-on experience combined with theoretical knowledge. The city’s unique infrastructure challenges—from network considerations to hybrid cloud environments—require adaptable problem-solving skills. As you prepare for your interviews, focus on understanding the “why” behind each solution, not just the “what.” Practice explaining your thought process clearly, as communication skills are particularly valued in Kolkata’s collaborative work culture.
For those seeking Docker opportunities in Kolkata, we recommend:
-
Hands-on practice with multi-node Swarm clusters
-
Understanding specific infrastructure considerations
-
Staying updated with the latest Docker features and best practices
-
Networking with Kolkata’s active DevOps community through local meetups and tech events
Whether you’re preparing for an interview at a traditional IT company in Salt Lake Sector V or a modern tech startup in New Town, these Docker questions will help you demonstrate the practical expertise that employers value most. Remember that in Kolkata’s growing tech ecosystem, professionals who can bridge traditional infrastructure with modern containerization technologies are in especially high demand.
Best of luck with your Docker interviews in Kolkata! May your container orchestration skills be as robust as the city’s rich cultural heritage, and may your career journey in the City of Joy be as rewarding as a perfectly optimized Swarm cluster running in production.
Like to know more about DevOps Master Class in Kolkata ? Click here

Cybersecurity Architect | Cloud-Native Defense | AI/ML Security | DevSecOps
With over 23 years of experience in cybersecurity, I specialize in building resilient, zero-trust digital ecosystems across multi-cloud (AWS, Azure, GCP) and Kubernetes (EKS, AKS, GKE) environments. My journey began in network security—firewalls, IDS/IPS—and expanded into Linux/Windows hardening, IAM, and DevSecOps automation using Terraform, GitLab CI/CD, and policy-as-code tools like OPA and Checkov.
Today, my focus is on securing AI/ML adoption through MLSecOps, protecting models from adversarial attacks with tools like Robust Intelligence and Microsoft Counterfit. I integrate AISecOps for threat detection (Darktrace, Microsoft Security Copilot) and automate incident response with forensics-driven workflows (Elastic SIEM, TheHive).
Whether it’s hardening cloud-native stacks, embedding security into CI/CD pipelines, or safeguarding AI systems, I bridge the gap between security and innovation—ensuring defense scales with speed.
Let’s connect and discuss the future of secure, intelligent infrastructure.