Introduction to Self-Service BI Solutions
- Data analysis should answer questions, and offer guidance in decision-making.
- To collaborate and work cohesively, analysts must be able to synchronize their teamwork.
- To be more efficient, and therefore more competitive, organizations of all sizes must gather data to some extent.
- The data must be architected and presented in a design that the organization can understand.
- The technical architect must communicate with the BI developers, and the operations team to ensure the BI environment is configured correctly.
- When more than one data modeller is working on the model, it is important that standards and naming conventions are created and adhered to. Data might be imported from different systems, so naming conventions are likely to vary across sources. This inconsistency should be addressed during the modelling process. If the model comprises a data warehouse, naming conventions should be used for the fact, lookup, and history tables.
- The semantic information should enable the model to describe itself.
- In-memory data and real-time operational analytics have the advantage that the data does not need to be extracted to a secondary location because in-memory processing is designed for optimal performance and can better handle aggregations.
- Having the ability to use the data generated from Software as a Services (SaaS) such as Facebook and Google Analytics is important for gathering a complete picture of activity in your data.
- If you are connecting to a database such as SQL Server, then using stored procedures to query the data is a preferable option. A stored procedure is a query that is stored on the server. Stored procedures are more efficient than specific, one-off queries because SQL Server creates an execution plan, which it reuses each time the procedure is called. This plan works out the optimal way to retrieve the data, resulting in the fastest possible return of results. They can also be used by other colleagues; sharing code prevents duplication of effort.
- It is important that you only return rows and columns from the database that you intend to use in your reports. Not only does importing unnecessary data create additional network traffic, but it also makes larger datasets more cumbersome to work with.
- The data that is extracted from the source system must be transformed into the correct format for loading into the destination database. Metadata must exist before transformations can be applied. The metadata determines what transformations need to be applied to the source data held in the staging tables so it can be loaded into the destination database. To accurately report on the data, you must ensure values are consistent if you intend to use them for filtering.
- Before applying any transformations to your data, it is a good idea to clean, or cleanse, the data first. This process corrects dirty data or removes it to another area for investigation. You want the quality of your data to be as high as possible.
- Logging missing values that are compulsory in the destination database.
- True and False values are frequently stored as 1 and 0 values in the source database and should be converted.
- Currency and number fields should be formatted and handled carefully. Ensure decimal columns that undergo any rounding up or down do not skew figures and produce unexpected results. If accuracy is critical, then you must ensure that values are entered correctly into the destination database. If decision makers are not concerned about precision and are happy with an approximate figure in aggregations, then you have more freedom to apply some formatting.
- Modern database features, such as in-memory data, and columnstore indexes, enable faster performance of handling aggregations of large datasets alongside up-to-the-minute results.
- If you need to display progress in attempting to meet a target figure, 100 percent stacked charts are useful.
- Line and area charts are useful for displaying data over a period of time, such as financial data.
- Scatter charts are useful for displaying large sets of data and, in particular, highlighting nonlinear trends, outliers, and clusters. The more data you include, the better the results. Your scatter chart must include a point identifier, otherwise, all the data is aggregated into a single point. You should add a non-numeric data field, such as Categories, to the chart Details property.
- When presenting your data in a report or dashboard, you should take care to ensure the most important information is easy to find. If your audience normally reads from left to right, top to bottom, then displaying the most critical data in the top left, flowing through to less important content at the bottom right, is helpful. If you have important figures that need to be presented clearly, so that they can be easily read, then the Card and Multirow card charts suit this purpose.
- In the card chart, if the Value column is not specified as a currency data type, then it shows only a number without the currency symbol. This should be included to make clear that it is a monetary figure. The data label can be turned off, but unless it is entirely clear what the figure refers to, this is best included.
- The multirow card chart is a useful way to clearly present numbers, without using the format of a table or matrix chart—which are difficult to digest. Like tables, the multirow card chart works best for smaller data sets; otherwise, there is too much data and text to read. For example, a multirow card chart is useful for displaying main categories, and sales.
- The map chart is useful for presenting data based on cities, rather than wide areas.
- The filled map chart is particularly useful for presenting socioeconomic or demographic data, because it provides a visual overview of data across a wide area, such as all the states in the United States.
- The shape map is ideal for comparing values across regions, such as demographic data.
- Using a table or matrix is useful when you want to display the actual numbers, such as for financial data, and is best used for smaller sets of data.
- Unlike a table or matrix chart, the tree map is more efficient in how it uses the space it consumes in a report. For example, by showing both City and Category in the tree map, it has effectively flattened the data and prevents the need for drilling down to see categories for each city.
- Having knowledge of the business, formatting data, and understanding which visualizations best display the data, are useful for making the most of the BI solution.
- Users must understand the principles and structures of data that is sourced from a relational database, a data warehouse, or an unstructured big data source, such as a social media site.
- Users should be familiar with all the major chart types and understand how to use them to display data most effectively so that decisions can be made. For example, geographic data is best presented using a map chart; a scatter chart should be used to show overlaps in data, clusters, and outliers. Financial data, such as a share price, is best displayed using a line chart.
Introducing Power BI
- DirectQuery is useful if you have very large datasets, and want to create your visualizations without loading large volumes of data. However, DirectQuery is not without its limitations, so you should shape data before you create your dataset. Each time the data is queried, the performance is dependent on the data source system, and how fast the data source system responds to the data request.
- Templates are useful for reusing data that has already been shaped, and visuals that have been customized using corporate colours. If you are producing several reports that share data, visuals, and formatting, templates are a useful feature for avoiding the duplication of work while ensuring consistency across reports.
- When creating a report or dashboard, the most important information should be presented first, in the top left-hand corner of the screen. This is particularly important when designing for mobile devices; a user will not be able to move pinned items, so it is vital to have the most important visual at the top—so it is visible first on a small mobile phone screen.
- Try to avoid having so many visuals on a report or dashboard that make the user scroll across or down.
- The most important information should not only be displayed first but should also have the biggest visual suitable for presenting it. You size visuals so that important information is displayed in bigger visuals and less important information in smaller visuals. This guides the user to interpret and digest the report or dashboard more efficiently.
- Your charts should be consistent, both in terms of design and axes. Ensure scales on axes and the order of dimensions are consistent and be aware of how you use colours.
- When displaying numbers, avoid using too many numerals, as this makes it difficult to read. Rather than displaying a card with $145,000,000, present the data as $145m or $145 million, because this is quicker and easier for the mind to interpret.
- Charts that present data over time should also be consistent, especially if you apply filterings. for example, don’t have one chart that displays data for the last quarter next to a chart showing data for April last year.
- Avoid using pie charts when you have many categories. When the number exceeds about seven or eight categories, choose another visual such as a bar or column chart. If there are too many, this makes it difficult to compare in a pie chart.
- When importing data by connecting to a data source, the Edit button to make transformations is a useful step if you have a large dataset, but want to reduce the amount of data that you import by excluding columns or filtering rows.
- If you remove tiles from a dashboard, be aware that the underlying datasets are also removed so you cannot use this data for your Q&A. This is particularly important if you pin the visualization answer to your dashboard.
Power BI Data
- After changing any data types that need altering, it’s good practice to then check that columns that Power BI has set as the default for sort orders, or aggregating, are correctly determined.
- When working with visuals, try to present data in an optimum way for enabling the end user to quickly digest the presented information. There are a few things you can do to optimize your data and make it more consistent. This helps you to work with your data more efficiently, focusing on the information you need. It is also helpful to colleagues or anyone with whom you might share the data.
- It’s a good idea to hide fields that you know you are not going to use in your visuals.
- Sorting data by correct attribute rather than automatically ordered alphabetically makes data analysis much easier as the user can read the data in the correct order. For instance, sort time data by day/month number instead of day/month name by default.
- Changing data types and formatting data are good ways of optimizing your data. This presents the data with clarity in your reports and dashboards.
Shaping and Combining Data
- You can apply a sort to multiple columns in a query, though you should always start with the column that has the least unique values. For example, apply the sort in order of Country, Region, and the City.
- You should always remove data that isn’t required. The dataset should be as succinct as possible, so you do not have redundant data that is loaded unnecessarily. If you have a large dataset, remove everything that isn’t required to make it as small as possible to improve the performance of handling the data in Power BI. This means less data is transferred from the source to Power BI; there is less data to be processed as the Query Editor applies the transformations, and you have less extraneous data when creating reports.
- Your columns should have names that make it easy to work with them when creating reports and viewing dashboards. Each column name should give the data in that column an adequate description. This is particularly relevant when working with datasets containing several tables and columns—it makes it easier to find the right fields to add to report visuals. Power BI Q&A, which uses the natural query language, also returns more accurate results if it can find the data needed to answer the question being asked of it.
- It is a good idea to check the given column types are as you would expect, and then format any that are incorrect. This can be critically important for decimal columns, where changing the data types between a decimal and a whole number could potentially give false results in calculations. In addition to formatting the data so it presents better in data labels.
Modelling Data
- When you refer to a column in a DAX formula and include the table name, this is known as a “fully qualified column name”. You can exclude the table name when the measure refers to a column in the same table in which it also resides; however, it is good practice to include it. While this can lengthen formulas that reference many columns, it provides clarity and the reassurance that you are referencing the correct columns—you can also create measures that span multiple tables, and move them as required.
Interactive Data Visualizations
- In the Power BI Desktop settings, Global, Auto Recovery, you can also toggle the Keep the last Auto Recovery version if I close without saving option. This useful feature is turned off by default but is certainly worth enabling to prevent any accidental loss of work.
Direct Connectivity
- Before connecting to a database in Azure SQL Database, ensure that you have configured the firewall settings to allow remote connections. Microsoft recommends that you allow access at the database level in Azure, rather than at the server level.
- DirectQuery restricts you to using a single database, but it is useful when you want to connect to very large datasets that could take a long time to load into Power BI. This can also be problematic when making changes to report items that cause a refresh of the data—this can cause further delays and make it cumbersome to work with the data.
- The Power BI Q&A natural language feature is not available when using DirectQuery. Q&A uses the data that is imported into datasets to build answers and cannot create this without the data being present.
- Before you can connect to SQL Server Analysis Services by using a live connection from the Power BI service, you must configure a Power BI gateway on your server.
The gateway runs as a Windows® service on the server running SQL Server Analysis Services. However, users need a Power BI Pro subscription to view content through the gateway. If you install the gateway in personal mode, you cannot install another gateway on the same machine.
Power BI Mobile
- The Microsoft Power BI for iOS app is compatible with the iPhone and iPad, and one of the useful features is called Data Alerts which can be added to tiles that display a single number. You can set thresholds to alert you when the number goes above or below the value you set, or you can set both.
- To use the Power BI mobile app to view reports and KPIs which are created by using SQL Server 2016 Enterprise Edition Mobile Report Publisher along with SQL Server 2016 Reporting Services web portal, you need to enable Basic Authentication on your reporting server.
- The cached data in the Power BI mobile app is automatically refreshed with data on the Power BI service (not the data source), whenever your device is connected to a network. However, Reporting Services reports and KPIs do not refresh in the background; instead, they refresh when you open them.