Tuesday, February 25, 2014

MSBI Interview Questions and Answers


SSIS/SSAS/SSRS/SQL Server /Data Warehouse Interview Questions and answers

 SSIS

1. What is a package?
a).a discrete executable unit of work composed of a collection of control flow and other objects, including data sources, transformations, process sequence, and rules, errors and event handling, and data destinations.

2. What is a workflow in SSIS?
a).`a workflow is a set of instructions on how to execute Tasks.
(It is a set of instructions on how to execute Tasks such as sessions, emails and shell commands. a workflow is created form work flow mgr.)

3. What is the Difference between control flow Items and data flow Items?
a).the control flow is the highest level control process. It allows you to manage the run-time process the run time process activities of data flow and other processes within a package.
When we want to extract, transform and load data within a package. You add an SSIS dataflow Task to the package control flow.

4. What are the main components of SSIS (project-architecture)?
A).SSIS architecture has 4 main components
1.SSIS service
2.SSIS runtime engine & runtime executables
3.SSIS dataflow engine & dataflow components
4.SSIS clients’

5.different components in SSIS package?
1. Control flow
2.data flow
3.event handler
4.package explorer

Containers: provide structure and scope to your package
Types of containers:
i. Task host container: the Taskhost container services a single Task.
ii. Sequence container: It can handle the flow of subset of a package and can help you drive a package into smaller more manageable process.
Uses:-

1. Grouping Tasks so that you can disable a part of the package that no longer needed.
2. Narrowing the scope of the variable to a container.
3. Managing the property of multiple Tasks in one step by setting the properties of the container.
iii. For loop container: evaluates an expression and repeats Its workflow until the expression evaluates to false.
iv. For each loop container: defines a control flow repeatedly by using an enumerator.
For each loop container repeats the control flow for each member of a specified enumerator.

Tasks: It provides the functionality to your package.

Ø  It is a individual unit of work.
Event handler: It responds to raised events in your package.

Precedence constraints: It provides ordinal relationship b/w various Items in your package.

6. How to deploy the package?
To deploy the package first we need to configure some properties.

Ø  Go to project tab->package properties->we get a window, configure deployment Utility as "True"

Ø  Specify the path as "bin/deployment"

7. Connection manager:
a).It is a bridge b/w package object and physical data. It provides logical representation of a connection at design time the properties of the connection mgr describes the physical connection that integration services creates when the package is run.

8. Tell the Utility to execute (run) the package?
a) In BIDS a package that can be executed in debug mode by using the debug menu or toolbar or from solution explorer.
In production, the package can be executed from the command line or from Microsoft windows Utility or It can be scheduled for automated execution by using the SQL server agent.
i). Go to->debug menu and select the start debugging button
ii).press F5 key
iii).right click the package and choose execute package.
iv).command prompts utilities


a).DTExecUI
1. To open command prompt->run->type dtexecui->press enter
2. The execute package Utility dialog box opens.
3. in that click execute to run the package.
Wait until the package has executed successfully.


b).DTExec Utility
1.open the command prompt window.
2. Command prompt window->type dtexec /followed by the DTS, SQL, or file option and the package path, including package name.
3. If the package encryption level is encrypting sensitive with password or encrypt all with password, use the decrypt option to provide the password.
If no password is included, dtexec will prompt you for the password.
4. Optionally, provide additional command-line options
5. Press enter.
6. Optionally, view logging and reporting information before closing the command prompt window.
The execute package Utility dialog box opens.
7. In the execute package Utility dialog box, click execute package.
Wait until the package has executed successfully.
v).using SQL server mgmt studio to execute package
1. In SSMS right click a package, and then click run package.
Execute package Utility opens.
2. Execute the package as described previously.

9. How can u design SCD in SSIS?
a) Def:-SCD explains how to capture the changes over the period of time.
This is also known as change data capture.
type1: It keeps the most recent values in the target. It does not maintain the history.
type2: It keeps the full history in the target database. For every update in the source a new record is inserted in the target.
type3: It keeps current & previous information in the target.
10. How can u handle the errors through the help of logging in SSIS?
a) To create an on error event handler to which you add the log error execute SQL Task.

11. What is a log file and how to send log file to mgr?
a) It is especially useful when the package has been deployed to the production environment, and you cannot use BIDS and VSA to debug the package.
SSIS enables you to implement logging code through the Dts. Log method.
When the Dts. Log method is called in the script, the SSIS engine will route the message to the log providers that are configured in the containing package.

12. What is environment variable in SSIS?
a) An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.

13. about multiple configurations?
a) It means including the xml configuration, environment variable, registry entry, parent package variable, SQL Server table, and direct and indirect configuration types.

14. How to provide security to packages?
a) In two ways
1. Package encryption
2. Password protection.

15. as per error handling in T/R, which one handle the better performance? Like fail component, redirect row or ignore failure?
a) Redirect row provides better performance for error handling.

16. Staging area??
a) It is a temporary data storage location. Where various data T/R activities take place.

Staging area is a kitchen of data warehouse.

17. Task??
a) An individual unit of work.
Types:-
1. Active x script Task
2. Analysis services execute DDL Task *
3. Analysis services processing Task *
4. Bulk insert Task *
5. Data flow Task *
6. Data mining query Task
7. Execute Dts 2000 package Task
8. Execute package Task *
9. Execute process Task
10. Execute SQL Task *
11. File system Task *
12. Ftp Task
13. Message queue Task
14. Script Task *
15. Send mail Task *
16. Web service Task
17. Wmi data reader Task
18. Wmi event Task
19. Xml Task

18. Event handler & logging?

Even handler is the mechanism to raise a event based on specific scenario. For example if there is any failure in data load it will notify thru email or entry in error table etc.

Logging can be done based on event, in SSIS there are 12 events that can be logged at Task or package level. You can enable partial logging for one Task and enable much more detailed logging for billing Tasks.
Example:-

On error; On post validate; On progress; On warning
In SSIS different type of logging mechanism are there:

SQL profiler
Text files
SQL server
Window event log
Xml file

19. Import & export wizard?
a) Easiest method to move data from sources like oracle, db2, SQL server.
Right click on database name->go to Task->import and export wizard
Select the source
Select the destination
Query copy of tables
Execute
Finish

20.what solution explorer?
Once you creating project with specific project name then if you want to add
data source/-data source views/packages/ miscellaneous; then this window will help to organize different files under one structure.

21. Precedence constraints?
a) Constraints that link executable, container, and Tasks within the package control flow and specify condition that determine the sequence
and conditions for determine whether executable run.

22. Data pipeline?
a) The memory based, multithreaded, buffered t/r process flow data through an SSIS data flow Task during package execution.

23. TRANSFORMATIONS??
It is an object that generates, modifies, or passes data.
1. AGGEGATE T/R:-It applies an aggregate function to grouped records and produces new output records from aggregated results.
2. AUDIT T/R:-the t/r adds the value of a system variable, such as machine name or execution instance GUID to a new output column.
3. CHARACTER MAP T/R:-this t/r makes string data changes such as changing data from lower case to upper case.
4. CONDITIONAL SPLIT:-It separate input rows into separate output data pipelines based on the Boolean expressions configured for each output.
5. COPY COLUMN:-add a copy of column to the t/r output we can later transform the copy keeping the original for auditing personal
6.DATA CONVERSION:-converts a columns data type to another data type.
7. DATA MINING QUERY:-perform a data mining query against analysis services.
8. DERIVED COLUMN:-create a new derive column calculated from expression.
9. EXPORT COLUMN:-It allows you to export a column from the data flow to a file.
10. FUZZY GROUPING:-perform data cleansing by finding rows that are likely duplicates.
11. FUZZY LOOKUP:-matches and standardizes data based on fuzzy logic.
Ex:-transform the name jon to john
12.IMPORT COLUMN:-reads the data from a file & adds It into a dataflow.
13. LOOKUP:-perform the lookup of data to be used later in a transform.
Ex:-t/T to lookup a city based on zip code.
1. Getting a related value from a table using a key column value
2. Update slowly changing dimension table
3.to check whether records already exist in the table.
14. MERGE:-merges two sorted data sets into a single data set into a single data flow.
15. MERGE JOIN:-merges two data sets into a single dataset using a join junction.
16. MULTI CAST:-sends a copy of two data to an additional path in the workflow.
17. ROW COUNT:-stores the rows count from the data flow into a variable.
18. ROW SAMPLING:-captures the sample of data from the dataflow by using a row count of the total rows in dataflow.
19. ROW SAMPLING:-captures the sample of the data from the data flow by using a row count of the total rows in data flow.
20. UNION ALL:-merge multiple data sets into a single dataset.
21. PIVOT:-converts rows into columns
22.UNPIVOT:-converts columns into rows

24. Batch?
a) A batch is defined as group of sessions. Those are 2 types.
1. Parallel batch processing
2. Sequential batch processing
to execute a SSIS package we will use "execute package utility"

To deploy a SSIS package we will use "package deployment Utility”


SSRS:--

1. What are the main components of reporting services?
a) Report designer, report server, report manager, report user.

2. Where can u publish the report?
a) By using report designer or publish reports on report server.

3. What are the necessity things for creating matrix report?
a) Page, column, row, details

4. For generating reports which is used like RDBMS OR CUBE?
a) Depends on data; report can be developed using different source like database tables/cube/text files etc.


5. What is .rdl file?
a) .rdl is a Report Definition Language. Every report saves with .rdl extension.






SSAS:-

1. What are the fixed measure and calculated measure?
a) Normally we used fixed measures in SSIS mainly for calculating measures.
Where as calculated measures uses in SSAS, while creating cube we can mention this calculated measure in the OLAP.

2. What are measures?
a) Measures are numeric data based on columns in a fact table.

3. What are cubes?
a) Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provided multidimensional analysis.

4. What are virtual cubes?
These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data.
DATAWARE HOUSE CONCEPTS:-
1. Difference b/w OLTP AND OLAP?

OLTP
OLAP
1.transactional processing
1.query processing
2.time sensitive
2.history oriented
3. Operator & clerks view
3.Managers, CEOs, PM’s views
4. Organized by transaction
(Order, input, inventory)
4.organized by subjects
(product, customer)
5.relatively smaller DB
5.large DB size
6.volatile data
6.non-volatile
7.stores all data
7.stores relevant data
8. Not flexible
8.flexible




2. Difference b/w star schema and snowflake?

STAR Schema
 Snowflake Schema
1 centrally located fact table surrounded by de normalize  dimension table
1 Centraly located fact table surrounded by the normalized dimension table
2 All dimensions will be linked directly with fact table
2 All dim link wIth each other (or) 
1-N relationship with other table
3 It is easy to understand the design
3 It is difficult to understand
4 We can easily retrieve data parsing the query against the Fact and Dim table
4 It is difficult to retrieve the data while
5 Increase the query performance because it involve less joins
5 more joins




What are fact tables?
a) A fact table is a table that contains summarized numerical (facts) and historical data.
This fact table has a foreign key-primary key relation with a dimension table. The fact table maintains the information in 3rd normal form.

3. Types of facts?
a)
1. Additive:-able to add the facts along with all the dimensions
-discrete numerical measures.
-Ex:-retail sales in $
2. Semi additive:-snapshot taken at a point in time
- Measure of intensity
-not additive along time dimensions
ex:-account balance, inventory balance
3.non-addItive:-numerical measures that can't be added across any dimensions.
Intensity measure arranged across all dimension
ex:-room temperatures, averages


4. Data warehouse?
a) A data ware house is a collection of data marts representing historical data from Difference operational data sources (OLTP).
The data from these OLTP are structured and optimized for querying and data analysis in a data warehouse.

5. Data mart?
a) A data mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or a department like sales dept, hr dept.

6. What is OLAP?
a) OLAP stands for online analytical processing. It uses databases tables (fact and dimension table) to enable multi dimensional viewing, analysis and querying of large amount of data.

7. What is OLTP?
a) OLTP stands for online transactional processing. Except data warehouse databases the other databases are OLTP.
These OLTP uses normalized schema structure.
These OLTP databases are designed for recording the daily operations and transactions of a business.

8. What are dimensions?
Dimensions are categories by which summarized data can be viewed. For example a profit summary fact table can be viewed by a time dimension.


9. What are conformed dimension?
a) The dimensions which are reusable and fixed in nature. Example customer, time, geography dimensions.

10. Staging area?
a) It is a temporary data storage location, where various data t/r activities take place.

11. Fact grain (granularity)?
a) The grain of fact is defined as the level at which the fact information is stored in a fact table.

12. What is a fact less fact table?
a) The fact table which does not contain facts is called as fact table.
Generally when we need to combine two data marts, then one data mart will have a fact less fact table and other one with common fact table.

13. What are measures?
a) Measures are numeric data based on columns in a fact table.

14. What are cubes?
a) Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provided multidimensional analysis.

15. What are virtual cubes?
These are combination of one or more real cubes and require no disk space to store them. They store only definition and not the data.

16. SCD's?
a)
type-I(current data)
type-II(full historical information& Current data)
type-III(Current data & Recent data)

SQL-SERVER-2005:-

1. Surrogate key?
a) It is an artificial or synthetic key that is used as a substitute for a natural keys.
It is just a unique identifier or number for each row that can be used for the primary key to the table.
(It is a sequence generate key which is assigned to be a primary key in the system(table)).

2. Primary key?
a) It can be used to uniquely identify every row of the table.
Unique + not null
3. Foreign key?
a) It is a column r combination of columns that contain values that are found in primary key of some table.
It may be null, not unique.

4. ComposIte key?
a) It is a primary key consisting of more than one column.

4. Indexes?
a) It is an access strategy that is a way to sort and search records in the table.
Indexes are essential to improve the speed with which records can be located and retrieved from a table.
Types: - cluster index (can create only one index on table)
Non-cluster index (can create up to 249 indexes on table)
Unique index
Composite index
(Simple index, reverse key index, bitmap index, function index)

5. View?
a) It is used for data security reason
to reduce the redundant data.

6. Cluster?
a) 1-many access path.
Clusters are used to store data from Difference tables in the same physical data blocks.

7. Sequences?
a) It is used to quickest way to retrieve the data.

8. Cursors?
a) Implicit cursor
explicit cursor
Parameter cursor

9. Triggers?
a) Row trigger
Statement trigger
10. Transactions?
Save point
Commit & rollback.
11. Security?
a) Encryption
Locking
Level of locking row level, page level, table level
12.constraints?
primary
foreign(reference)
check
unique

13. Difference b/w having and where?
a) after performing 'group by' operation 'having will again filter the records based on having condition
'where' is used to filter the data based on a condition and It applies to retrieve on a particular column.
14. Joins?
a) Join can combine the information from two tables into a single unit.
Inner join:-
they matched the records together based on one or more common fields (matched-records only).
Outer join:-

full join:-It combines the all rows on both sides of the join.
Cross join:-


15. Union & union-all?
a) Union:-columns, data types should be same
Select distinct values
Remove duplicates
Union-all:-displays all the rows exact & duplicates.
16. Difference b/w drop, delete & truncate?
Delete:-delete all rows at a time
delete a single row data based on condition.
Memory allocation will be there
structure will be there
Truncate:-delete all rows at a time
can't delete single row at a time
memory allocation deleted
table structure will be there
Drop: - delete all rows at a time
can't delete single row at a time
memory allocation can be deleted
table structure also be deleted



SSIS – Non-blocking, Semi-blocking and Fully-blocking components
How can you recognize these three component types, what is their inner working and do they acquire new buffers and/or threads?
Synchronous vs Asynchronous
The SSIS dataflow contain three types of transformations. They can be non-blocking, semi-blocking or fully-blocking. Before I explain how you can recognize these types and what their properties are its important to know that all the dataflow components can be categorized to be either synchronous or asynchronous.
·         Synchronous components The output of an synchronous component uses the same buffer as the input. Reusing of the input buffer is possible because the output of an synchronous component always contain exactly the same number of records as the input. Number of records IN == Number of records OUT.
·         Asynchronous components The output of an asynchronous component uses a new buffer. It’s not possible to reuse the input buffer because an asynchronous component can have more or less output records then input records.
The only thing you need to remember is that synchronous components reuse buffers and therefore are generally faster than asynchronous components, that need a new buffer.

All source adapters are asynchronous, they create two buffers; one for the success output and one for the error output. All destination adapters on the other hand, are synchronous.


Non-blocking, Semi-blocking and Fully-blocking
In the table below the differences between the three transformation types are summarized. As you can see it’s not that hard to identify the three types.
On the internet are a lot of large and complicated articles about this subject, but I think it’s enough to look at the core differences between the three types to understand their working and (dis)advantages:

Non-blocking
Semi-blocking
Fully-blocking
Synchronous or asynchronous
Synchronous
Asynchronous
Asynchronous
Number of rows in == number of rows out
True
Usually False
Usually False
Must read all input before they can output
False
False
True
New buffer created?
False
True
True
New thread created?
False
Usually True
True

All SSIS transformations categorized:
Non-Blocking transformations
Semi-blocking transformations
Blocking transformations
Audit
Data Mining Query
Aggregate
Character Map
Merge
Fuzzy Grouping
Conditional Split
Merge Join
Fuzzy Lookup
Copy Column
Pivot
Row Sampling
Data Conversion
Unpivot
Sort
Derived Column
Term Lookup
Term Extraction
Lookup
Union All

Multicast


Percent Sampling


Row Count


Script Component


Export Column


Import Column


Slowly Changing Dimension


OLE DB Command

Members, Tuples, and Sets

SQL Server 2000
47 out of 60 rated this helpful - Rate this topic
Before proceeding on the creation of a Multidimensional Expressions (MDX) query, you should understand the definitions of members, tuples and sets, as well as the MDX syntax used to construct and refer to these elements.
Members
A member is an item in a dimension representing one or more occurrences of data. Think of a member in a dimension as one or more records in the underlying database whose value in this column falls under this category. A member is the lowest level of reference when describing cell data in a cube.
For example, the following diagram is shaded to represent the Time.[2nd half].[3rd quarter] member.

The bracket characters, [ and ], are used if the name of a member has a space or a number in it. Although the Time dimension is one word, bracket characters can also be used around it as well; the member shown in the previous diagram could also be represented as:
[Time].[2nd half].[4th quarter]
The right bracket (]) can be used as an escape character in MDX if the member name or member key contains a right bracket, as shown in the following example:
[Premier [150]] 98]
Member Names and Member Keys
A member can be referenced by either its member name or by its member key. The previous example referenced the member by its member name, 4th quarter, in the Time dimension. However, the member name can be duplicated in the case of dimensions with nonunique member names, or it can be changed in the case of changing dimensions.
An alternate method to reference members is by referencing the member key. The member key is used by the dimension to specifically identify a given member. The ampersand (&) character is used in MDX to differentiate a member key from a member name, as shown in the following example:
[Time].[2nd half].&[Q4]
In this case, the member key of the 4th quarter member, Q4, is used. Referencing the member key ensures proper member identification in changing dimensions and in dimensions with nonunique member names.
The ampersand character can be used to indicate a member key reference in any MDX expression.
Calculated Members
Members can also be created, as part of an MDX query, to return data based on evaluated expressions instead of stored data in a cube to be queried. These members are called calculated members, and they provide a great deal of the power and flexibility of MDX. The WITH keyword is used in an MDX query to define a calculated member. For example, if you want to provide a forecast estimate all of the packages by adding 10% of the existing value of the Packages measure, you can simply create a calculated member that provides the information and use it just like any other member in the cube, as demonstrated in the following example.
WITH MEMBER [Measures].[PackagesForecast] AS
'[Measures].[Packages] * 1.1'
For more information, see Calculated Members.
Member Functions
MDX supplies a number of functions for retrieving members from other MDX entities, such as dimensions and levels, so that explicit references to a member are not always necessary. For example, the FirstChild function allows the retrieval of all the members from a given dimension or level; to get the first child member of the Time dimension, you can explicitly state it, as demonstrated in the following example:
Time.[1st half]
You can also use the FirstChild function to return the same member, demonstrated in the next example.
Time.FirstChild
For more information about MDX member functions, see MDX Function List.
Tuples
A tuple is used to define a slice of data from a cube; it is composed of an ordered collection of one member from one or more dimensions. A tuple is used to identify specific sections of multidimensional data from a cube; a tuple composed of one member from each dimension in a cube completely describes a cell value. Put another way, a tuple is a vector of members; think of a tuple as one or more records in the underlying database whose value in these columns falls under these categories. A series of diagrams presents different types of tuples.
The shaded area of the cube represents the (Time.[2nd half]) tuple. Note that this tuple encompasses half of the cube, because it does not rule out any information in the Source or Route dimensions.

The following diagram is shaded to represent the (Time.[2nd half], Route.nonground.air) tuple.

This tuple represents the cells at the intersection of these members.
In MDX, tuples are syntactically constructed depending upon their complexity. If a tuple is composed of only one member from a single dimension, often referred to as a simple tuple, the following syntax is acceptable.
Time.[2nd half]
If a tuple is composed of members from more than one dimension, the members represented by the tuple must be enclosed in parentheses, as demonstrated in the following example.
(Time.[2nd half], Route.nonground.air)
A tuple composed of a single member can also be enclosed in parentheses, but this is not required. Tuples are often grouped together in sets for use in MDX queries.
Tuple Functions
There are a few MDX functions that return tuples, and they can be used anywhere that a tuple is accepted.
For more information about tuple functions, see MDX Function List.
Tuples and Dimensionality
A tuple can encompass members in multiple dimensions, as well as multiple members from the same dimension. The term dimensionality is used to indicate the dimensions described by the members in a tuple. Order plays a factor in the dimensionality of a tuple, and can affect the use of a tuple within a set.
Sets
A set is an ordered collection of zero, one or more tuples. A set is most commonly used to define axis and slicer dimensions in an MDX query, and as such may have only a single tuple or may be, in certain cases, empty. The following example shows a set of two tuples:
{ (Time.[1st half], Route.nonground.air), (Time.[2nd half], Route.nonground.sea) }
A set can contain more than one occurrence of the same tuple. The following set is acceptable:
{ Time.[2nd half], Time.[2nd half] }
A set refers to either a set of member combinations, represented as tuples, or to the values in the cells that the tuples in the set represent, depending on the context of usage for the set.
In MDX syntax, tuples are enclosed in braces to construct a set.
Important  Sets composed of a single tuple are not tuples; they are interpreted as sets by MDX. Certain MDX functions accept tuples as parameters, and will raise an error if a single tuple set is passed. Tuples and single-tuple sets are not interchangeable.
Set Functions
Explicitly typing tuples and enclosing them in braces is not the only way to retrieve a set. MDX supports a wide variety of functions that return sets.
The colon operator allows you to use the natural order of members to create a set. For example, the following set:
{[1st quarter]:[4th quarter]}
retrieves the same set of members as the following set:
{[1st quarter], [2nd quarter], [3rd quarter], [4th quarter]}
The colon operator is an inclusive function; the members on both sides of the colon operator are included in the resulting set.
Other MDX functions that return sets can be used either by themselves or as part of a comma-delimited list of members. For example, all of the following MDX expressions are valid:
{Time.Children}
{Time.Children, Route.nonground.air}
{Time.Children, Route.nonground.air, Source.Children}
For more information about set functions, see MDX Function List.
Sets and Dimensionality
Like tuples, sets also have dimensionality. As a set is composed of tuples, so the dimensionality of a set is expressed by the dimensionality of each tuple within it. Because of this, tuples within a set must have the same dimensionality. In other words, this example would not work as a set:
{ (Time.[2nd half], Route.nonground.air), (Route.nonground.air, Time.[2nd half]) }
The order of tuples in a set is important; it affects, for example, the nesting order in an axis dimension. The first tuple represents the first, or outermost, dimension, the second tuple represents the next outermost dimension, and so on.
Named Sets
A named set is a set for which an alias has been created. A named set is most commonly used in complex MDX queries to make these queries easier to read and to increase the ease of maintenance.
For more information about named sets, see Building Named Sets in MDX.


0 comments:

Post a Comment