Data Analytics Interview Questions & Answers (डेटा एनालिटिक्स इंटरव्यू प्रश्न एवं उत्तर)

1. OLTP और OLAP में क्या अंतर है?

OLTP (Online Transaction Processing)
- हाई-वॉल्यूम, real-time transaction (INSERT/UPDATE/DELETE) के लिए optimized
- ACID compliance से data integrity सुनिश्चित
- उदाहरण: बैंकिंग ट्रांज़ैक्शन, retail POS systems
OLAP (Online Analytical Processing)
- बड़े historical डेटा पर complex SELECT queries के लिए बनाया गया
- multidimensional analysis (cubes, roll-up, drill-down) सपोर्ट
- use case: BI reports, data mining

2. 10% मिसिंग वैल्यू वाले dataset को कैसे clean करेंगे?

Missingness का पता लगाना: MCAR/MAR/MNAR pattern देखें
Imputation strategies:
- न्यूमेरिकल: mean/median या model-based (k-NN)
- कैटेगोरिकल: mode या “Unknown” category
Row/Column removal: >50% मिसिंग या low-info वाले drop करें
Validate impact: ensure bias न आये और distribution distort न हो

3. Real-time analytics के लिए ETL pipeline कैसे डिज़ाइन करेंगे?

Extract: Kafka/Kinesis जैसे message queues या real-time APIs से data ingest
Transform: Apache Flink / Spark Structured Streaming से filter, enrich, aggregate
Load: Redis/Druid जैसे real-time store या BigQuery/Snowflake data warehouse में लिखें
Considerations: schema evolution, exactly-once processing, backpressure, low-latency SLAs

4. प्रोजेक्ट में data quality कैसे सुनिश्चित करेंगे?

Standards & Guidelines: clear data collection protocols
Validation Rules: ingestion पर type, range, mandatory checks
Cleaning: dedupe, consistency corrections
Audits & Monitoring: health dashboards, scheduled checks
Timely Updates: automate incremental refreshes, stale data handling

5. Hypothesis testing में p-value का महत्व क्या है?

p-value बताता है कि अगर null hypothesis सच हो तो observed data जैसा या उससे ज्यादा extreme data मिलने की probability कितनी है
छोटी p-value (<0.05): H₀ reject → statistically significant
बड़ी p-value: H₀ को reject करने के लिए पर्याप्त evidence नहीं

6. Normalization और Standardization में क्या फर्क है?

Normalization (min–max): डेटा को [0,1] या [–1,1] में रीसकेल करता है
Standardization (z-score): mean = 0, σ = 1 पर सेंटर करता है
Use cases: normalization when units अलग हों; standardization for algorithms like SVM, logistic regression

7. बड़े datasets में SQL query कैसे optimize करेंगे?

Indexing: JOIN/WHERE/ORDER BY columns पर
Selective Projections: सिर्फ needed columns SELECT करें
Filter Early: WHERE को JOIN से पहले Apply करें
Efficient Joins: INNER JOIN > OUTER; cross joins avoid करें
Execution Plan देखें: bottlenecks, missing stats identify करें

8. Skewed data distributions को कैसे handle करेंगे?

Transformations: log, sqrt, Box–Cox
Winsorization/Clipping: outliers cap करें
Resampling: SMOTE या down-sampling for classification
Model Choice: tree-based methods (कम संवेदनशील)

9. Type I और Type II error क्या हैं? उदाहरण दें।

Type I (False Positive): सच H₀ को reject
- उदाहरण: healthy person को disease positive बताना
Type II (False Negative): गलत H₀ को retain
- उदाहरण: malade patient में disease मिस हो जाना
α (significance level) और power (1–β) से balance करें

10. LEFT JOIN vs FULL OUTER JOIN में क्या अंतर?

LEFT JOIN: left table के सभी rows + matching right rows; non-matches में NULL
FULL OUTER JOIN: दोनों tables के सभी rows; non-matches दोनों तरफ NULL

11. Product performance ट्रैक करने के लिए dashboard कैसे डिज़ाइन करेंगे?

Overview metrics: sales, revenue, conversion rate
Visuals:
- trend → line chart
- comparison → bar chart
- composition → pie/stacked bar
Filters: time, region, product
Benchmarks: targets या पिछले period से compare
Alerts: threshold-based या anomaly detection

12. RDBMS vs NoSQL में कैसे decide करेंगे?

RDBMS: structured schema, strong ACID, complex joins; transactional workloads के लिए best
NoSQL: flexible schema, horizontal scalability, high write throughput; semi-structured या evolving data के लिए

13. Databases में data normalization क्या है?

टेबल्स को split करके redundancy हटाना और referential integrity रखना:
- 1NF: atomic values
- 2NF: remove partial dependencies
- 3NF: remove transitive dependencies
Benefits: update anomalies कम, storage optimized, maintenance आसान

14. Outliers detect और handle कैसे करेंगे?

Detection:
- Box plot, scatter plot
- IQR rule (±1.5×IQR), z-score (>3σ)
Handling:
- Erroneous → remove
- Cap/impute (median)
- Transform (log)

15. A/B testing का approach बताइये।

Objective: define metric (जैसे conversion rate)
Design: control vs variant randomly split करें
Sample size & duration: minimum N और test length calculate करें
Run test: data collect करें
Statistical analysis: t-test, χ²
Interpretation & rollout

16. Batch processing vs Stream processing:

Batch: accumulated data को intervals पर process (daily/weekly), high throughput, high latency
Stream: real-time data process जैसे ही आए, low latency, event-driven

17. SQL joins optimize कैसे करेंगे?

Join keys पर proper indexes
WHERE/ON conditions early apply करें
Smaller tables पहले join करें
सिर्फ आवश्यक tables शामिल करें
Execution plan review करके hash vs nested loops adjust करें

18. Data warehouse में star schema कैसे डिज़ाइन करेंगे?

Business process: जैसे sales identify करें
Fact table: measures (sales_amount, quantity)
Dimension tables: date, product, customer, location
Relationships: fact के foreign keys → dimension primary keys
Optimize: denormalization, partitioning

19. SQL में 90th percentile sales कैसे निकालेंगे?

SELECT PERCENTILE_CONT(0.9) 
  WITHIN GROUP (ORDER BY sales) AS pct_90_sales
FROM sales_table;

अगर unsupported हो तो NTILE(10) use करके top bucket filter करें।

20. Star schema vs Snowflake schema:

Star: fact center में, denormalized dimensions; simple joins, fast queries
Snowflake: dimensions normalized into sub-tables; complex joins, storage optimized

21. Database में indexing का रोल क्या है?

lookup fast करता है (B-tree structures)
full-table scan कम करता है
Types: clustered (data reorder), non-clustered (separate structure)
trade-off: faster reads vs slower writes + extra storage

22. SQL में churn rate कैसे calculate करेंगे?

WITH start_cte AS (
  SELECT COUNT(*) AS total_start
  FROM customers
  WHERE join_date <= '2025-01-01'
),
churn_cte AS (
  SELECT COUNT(*) AS churned
  FROM customers
  WHERE churn_date BETWEEN '2025-01-01' AND '2025-03-31'
)
SELECT 
  (churned::FLOAT / total_start) * 100 AS churn_rate_pct
FROM start_cte, churn_cte;

Dates को period के हिसाब से adjust करें।

23. Python vs SQL: task के लिए कैसे चुनें?

SQL: set-based operations, filtering, aggregations, joins
Python: complex analytics, ML, custom transforms, unstructured data (pandas, scikit-learn)

24. Supervised vs Unsupervised learning:

Supervised: labeled data → predict output (regression, classification)
Unsupervised: unlabeled data → patterns/grouping (clustering, PCA)

25. Data analytics प्रोजेक्ट में tasks कैसे prioritize करेंगे?

Objectives (business questions) define करें
Impact assessment (ROI, value) करें
Dependencies (data sources, tools) देखें
Resources allocate करें (skills, time)
Timeline & milestones set करें (sprints)
Review & adapt feedback loops

26. Dataset visualization के लिए chart कैसे चुनें?

Data type: categorical vs numerical
Goal: comparison, distribution, trend, composition, relationship
Chart suggestions:
- Comparison → bar/column
- Distribution → histogram/box plot
- Trend → line chart
- Composition → pie/stacked bar
- Relationship → scatter/bubble

27. UNION vs UNION ALL में क्या फर्क?

UNION: duplicates हटाता है (extra sorting)
UNION ALL: duplicates रखता है; faster

28. Data pipeline की scalability कैसे सुनिश्चित करें?

Distributed processing: Spark, Kafka Streams
Horizontal scaling: नए nodes add करें
Modular design: micro-services/steps
Auto-scaling: cloud features
Optimized storage: partitioned, columnar
Monitoring & load balancing: bottlenecks detect करें

29. Predictive modeling में correlated variables कैसे handle करें?

Identify: correlation matrix, VIF
Remove/combine redundant features
Regularization: Lasso (L1), Ridge (L2)
Dimensionality reduction: PCA
Domain knowledge से चुनें

30. SQL में RANK() vs DENSE_RANK():

RANK(): ties के बाद gaps (1,1,3)
DENSE_RANK(): no gaps (1,1,2)
दोनों ORDER BY के आधार पर ranking assign करते हैं

स्रोत: “Data Analytics Interview Questions” PDF by BossCoder Academy

Data Analytics Interview Questions & Answers (डेटा एनालिटिक्स इंटरव्यू प्रश्न एवं उत्तर)

Posted by Pawan Keshari

Post a Comment

0 Comments

Search This Blog

Most Popular

Quotes - Said By Great Persons

Motivation Speech That you have to listen before you sleep

Sunderkand - GEETA PRESS GORAKHPUR

Tags

Contact Us.

Ebook Duniya

Best Books

THE INTELLIGENT INVESTOR - BENJAMIN GRAHAM

Motivational Videos - Can make you stronger than others

मां की कहानी..........!

Popular Posts

Quotes - Said By Great Persons

IF YOU WANT ACHIEVE YOUR GOAL YOU MUST HAVE LISTEN THIS - MOTIVATIONAL SPEECH

Sunderkand - GEETA PRESS GORAKHPUR

Footer Menu Widget

Contact form

Data Analytics Interview Questions & Answers (डेटा एनालिटिक्स इंटरव्यू प्रश्न एवं उत्तर)

Posted by Pawan Keshari

You may like these posts

Post a Comment

0 Comments

Search This Blog

Most Popular

Tags

Contact Us.

Ebook Duniya

Best Books

Popular Posts

Footer Menu Widget

Contact form