Unlock hundreds more features
Save your Quiz to the Dashboard
View and Export Results
Use AI to Create Quizzes and Analyse Results

Sign inSign in with Facebook
Sign inSign in with Google

Master Your Data Analyst and Engineer Skills Assessment Quiz

Assess Your Analytical and Engineering Proficiency Today

Difficulty: Moderate
Questions: 20
Learning OutcomesStudy Material
Colorful paper art depicting a quiz on Data Analyst and Engineer Skills Assessment

Ready to test your data analysis and engineering skills? This Data Analyst and Engineer Skills Assessment Quiz includes 15 multiple-choice questions that measure core skills in SQL, ETL pipelines, and data modeling. It's perfect for aspiring analysts, engineers, or anyone seeking to refine their technical toolkit. Freely modify questions and settings in our editor to address specific skill gaps. For more challenges, try Data Analyst Technical Assessment Quiz or explore GCP Professional Data Engineer Practice Quiz, and browse all quizzes today.

Which SQL join returns only rows with matching values in both tables?
Inner Join
Left Join
Full Outer Join
Right Join
An inner join returns only the records that have matching values in both tables. It excludes rows from both tables where the join condition is not met.
In a star schema, what is the central table that stores quantitative data called?
Fact Table
Lookup Table
Dimension Table
Bridge Table
In a star schema, the fact table is the central table that stores measurable, quantitative data for analysis. Dimension tables surround the fact table and store descriptive attributes.
Which SQL clause is used to filter records after aggregation functions are applied?
GROUP BY
HAVING
ORDER BY
WHERE
The HAVING clause is used to filter groups after aggregation functions like SUM or COUNT have been applied. WHERE cannot filter aggregated results because it acts before grouping.
In the ETL process, which step is responsible for cleaning and standardizing data?
Load
Extract
Transform
Archive
The Transform step in ETL handles data cleaning, standardization, and enrichment. This ensures the data is in the correct format before loading into the destination system.
Which language is primarily used for querying relational databases?
Python
SQL
Java
R
SQL (Structured Query Language) is the standard language for querying and manipulating relational databases. It provides commands for data retrieval, updating, and management.
Which tool is optimized for in-memory batch processing within the Hadoop ecosystem?
Apache Hive
Apache Spark
Hadoop MapReduce
Apache Pig
Apache Spark is designed for in-memory computation, which makes batch processing much faster than traditional disk-based MapReduce. It integrates well with the Hadoop ecosystem.
What is the key purpose of incremental loading in an ETL pipeline?
To archive historical data permanently
To transform all data from scratch
To process only new or changed data
To extract data from all sources repeatedly
Incremental loading processes only the new or changed data since the last load, reducing processing time and resource consumption. This approach improves pipeline efficiency.
Which index type is most effective for speeding up range queries on sorted columns?
GiST Index
Hash Index
Bitmap Index
B-tree Index
B-tree indexes maintain sorted order and allow efficient range queries by traversing the tree structure. Hash indexes are efficient for equality checks but not for ranges.
Which Slowly Changing Dimension type maintains full history by adding new rows for changes?
Type 2
Type 0
Type 3
Type 1
Slowly Changing Dimension (SCD) Type 2 retains the history by creating a new row for each change, preserving previous values. This allows analysts to query historical data accurately.
Which SQL function returns the number of rows in a result set?
AVG()
ROW_NUMBER()
COUNT()
SUM()
COUNT() is the SQL aggregate function that returns the number of rows in a result set. SUM() and AVG() perform numeric aggregations, and ROW_NUMBER() enumerates rows.
In pipeline monitoring, which metric measures records processed per unit time?
Uptime
Latency
Error Rate
Throughput
Throughput measures the number of records processed per unit time in a data pipeline. Latency measures the delay for individual records rather than volumes.
In a snowflake schema, how are dimension tables structured?
Highly denormalized
Normalized into multiple related tables
Merged with fact tables
Stored as flat files
A snowflake schema normalizes dimension tables into multiple related tables to reduce redundancy. This differs from a star schema, which uses denormalized dimensions.
Which Python library is most commonly used for data manipulation?
SciPy
Pandas
Matplotlib
NumPy
Pandas provides data structures like DataFrame and Series for efficient data manipulation and analysis. NumPy focuses on numerical arrays, while Matplotlib is for plotting.
What does ACID stand for in relational database transactions?
Atomicity, Consistency, Isolation, Durability
Accuracy, Concurrency, Integrity, Durability
Availability, Consistency, Isolation, Distribution
Atomicity, Concurrency, Isolation, Distribution
ACID stands for Atomicity, Consistency, Isolation, and Durability, which are properties that guarantee reliable transaction processing in relational databases.
Which open-source tool is commonly used for orchestrating data workflows with directed acyclic graphs?
Apache Airflow
Kafka Connect
Talend
Apache NiFi
Apache Airflow uses directed acyclic graphs (DAGs) to define and schedule complex workflows. It is widely adopted for ETL orchestration and pipeline management.
Which SQL query pattern retrieves the second highest salary from an employee table?
SELECT DISTINCT salary FROM employees ORDER BY salary DESC LIMIT 2;
SELECT MIN(salary) FROM employees WHERE salary > (SELECT MIN(salary) FROM employees);
SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);
SELECT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 1;
This query finds the maximum salary that is less than the overall maximum salary, effectively returning the second highest value. The other patterns either return wrong offsets or distinct sets without pinpointing the second value.
Which normal form removes transitive dependencies in a relational schema?
Second Normal Form
Third Normal Form
First Normal Form
Boyce - Codd Normal Form
Third Normal Form (3NF) eliminates transitive dependencies by ensuring that non-key attributes depend only on the primary key. This reduces redundancy and update anomalies.
In Apache Airflow, which construct defines dependencies among tasks?
Operators
Directed Acyclic Graph (DAG)
Sensors
Tasks
A DAG (Directed Acyclic Graph) in Airflow defines task dependencies and execution order. Operators define single tasks, but the DAG organizes how they relate.
Which file format is most optimized for read-heavy analytical workloads in big data environments?
Avro
JSON
CSV
Parquet
Parquet is a columnar storage format that offers efficient compression and encoding schemes, making it ideal for read-heavy analytical queries. It reduces I/O by only reading required columns.
When optimizing a SQL query, which approach can significantly reduce disk I/O?
Dropping foreign keys
Using more joins
Using covering indexes
Increasing RAM
Covering indexes include all the columns needed by a query, allowing the database to fetch data directly from the index without accessing the base table. This reduces disk I/O and speeds up query execution.
0
{"name":"Which SQL join returns only rows with matching values in both tables?", "url":"https://www.quiz-maker.com/QPREVIEW","txt":"Which SQL join returns only rows with matching values in both tables?, In a star schema, what is the central table that stores quantitative data called?, Which SQL clause is used to filter records after aggregation functions are applied?","img":"https://www.quiz-maker.com/3012/images/ogquiz.png"}

Learning Outcomes

  1. Analyse real-world datasets to extract actionable insights.
  2. Evaluate data engineering workflows and pipeline efficiencies.
  3. Identify optimal tools and technologies for data transformation.
  4. Apply robust SQL queries for data retrieval and manipulation.
  5. Demonstrate best practices in database schema design.
  6. Master foundational concepts in data modeling and ETL processes.

Cheat Sheet

  1. Understand Your Data Schema - Think of your schema as a treasure map: tables are islands, columns are landmarks, and relationships are hidden paths. By mastering this map, you'll avoid getting lost in your queries and find your data treasures faster. Learn more
  2. Use Descriptive Naming Conventions - Swap out cryptic codes for clear names like customer_id instead of cust_id. Friendly names make your queries easy to read and help teammates (and future you) understand what's going on at a glance. Learn more
  3. Write Clear and Concise Queries - Break big, scary queries into bite-sized subqueries or CTEs - like slicing a pizza into perfect, shareable pieces. You'll improve readability, simplify debugging, and make those complex analyses feel like a walk in the park. Learn more
  4. Optimize Your Queries with Indexes - Indexes are like speed boosters for your database: add them to columns you filter on often, and watch your queries fly. Just don't go overboard - too many indexes can slow down your data writes and turn your booster into a brake. Learn more
  5. Avoid Using SELECT * - Pulling in every column is like packing your entire closet for a weekend trip - you end up lugging around what you don't need. By specifying only the fields you want, your queries run leaner and meaner. Learn more
  6. Use JOINs Wisely - Choosing the right JOIN (INNER, LEFT, RIGHT) is like picking the perfect dance partner for your data tables. Match them correctly and you'll glide through combined datasets; mismatch them and watch performance and results stumble. Learn more
  7. Leverage Window Functions - Window functions let you calculate running totals, ranks, and moving averages without breaking a sweat. They're your secret sauce for analyses that need a "close-up" on groups of rows, all while keeping things compact and efficient. Learn more
  8. Handle NULL Values Appropriately - NULLs are like mysterious blank pages in your report - they can hide crucial details or throw off your math. Use functions like COALESCE to fill in defaults and keep your results accurate and meaningful. Learn more
  9. Use Transactions for Data Integrity - Think of transactions as the safety net that keeps your database from falling into chaos. By grouping related operations into one atomic block, you ensure either everything succeeds or nothing changes - no half-finished updates allowed! Learn more
  10. Document and Comment Your Code - Good comments are like sticky notes for your brain: they explain tricky logic and remind you why you did something that way. Well-documented queries save time, reduce headaches, and make you a team superhero. Learn more
Powered by: Quiz Maker