Friday 26 August 2016

Which ETL tool?

You are wandering what is the best tool to gather, meaning connect and extract from your data sources, and nanage data, that is, updating and backing up in the database. Well you have a few choices like:

  1. Oracle Warehouse Builder (OWB)
  2. Oracle Data Integrator (ODI)
  3. SAP Data Services
  4. IBM Infosphere Information Server
  5. SAS Data Management. 
  6. PowerCenter Informatica. 
  7. Elixir Repertoire for Data ETL 
  8. Data Migrator (IBI) 
  9. SQL Server Integration Services (SSIS)
  10. Talend
  11. Pentaho
  12. many others...
But, why be pushed to buy a tool when perhaps the best method is basic SQL Scripting?

SQL Scripting will probably be cheapest choice and the easiest way to implement and maintain your own data warehouse. You can build your own queries and run different extract and processing tasks using SQL scripts and scheduling using SQL Server Jobs or the easy option that is Windows Scheduler. Don't reinvent the wheel and if you want raw data in your data warehouse you need only to identify and connect to the source and write some basic select insert into queries, you have your base data. Once you begin using different tools you will start loosing focus on the core task at hand, finding and understanding the data, systems and processes.

Many times companies build complex ETL processes to collect data with complex data flows and auditing procedures that usually are of no use or value for the stakeholder and end-user. The usual arguments are compliance, control and efficiency, but the business case for complicated methods is flaud. It becomes so difficult to maintain and change that it does not answer what it was build for. The costs increase, with high maintenance like licensing and payroll and usually because it does not fulfill the requirements of users you are bringing to life shadow IT and end-user computing with users finding their way to the information they want and building the databases they need, also breaking the compliance requirement.

Keep it simples, flexible and use easy to use tools that can also be made available to end-users so it becomes an collaborative and value-adding endeavour.

Thursday 25 August 2016

Forget About Vlookup

Vlookup function is one of the most used operations in Excel. It allows to search in a table and return the result in another table based on a selection criteria. Basically it is the same as filtering something in a table and copying the results from a specific column into another. It is very useful to join information from two different tables, for example, getting the name of a customer or product based on the code or reference number. Stop using it... please.

Start using Index and Match. It will give you more freedom, less errors, more flexibility, faster results and less time calculation.

How to use it:
INDEX ( B:B ; MATCH ( D ; A:A ; 0) ) or
INDEX ( Results ; MATCH ( Reference ; Reference List ; 0 ) )

What you want - B:B the column with values you want to return or get (e.g. customer name)
How to find it - D the selection criteria or reference code (e.g. customer number)
Where to find it - A:A the column with value to select or filter from (e.g. customer numbers)

Example:





What you gain:
1. Stop getting referencing errors when you add, delete or change the order of columns
2. Never need to count the number of columns to identify the column you want to get returned
3. Get faster results and do not to wait every time for recalculating when you change something
4. Easier to review and understand were you are getting information from (easy to read)
5. No specific column order, that is, the search criteria is not required to be on the left

Using Range Names will also improve the flexibility and capacity to understand formulas and the Match formula has a lot of other applications that we may talk about later one (e.g. read data from a matrix or double entry table).

Excel can be made very simple and even easier to use if you use it properly.

Hope this helps. Please share your feedback and questions.


Sunday 21 August 2016

Someone Hates Excel

By this time the business world has already figured out that Excel is a very powerful tool to do simple and also some complex calculations, prepare tables and graphs for exploration analysis and presentations, relate data from different tables and sources and sometimes even serve as a datawarehouse. This is true because no other tool performs this tasks, all together, as fast and as simple and easy to use as Excel, and that makes it such a wonderful tool. We know the limitations of it and the misusage, there is no perfect tool, but the strenghts are undeniable.
So why are some people so afraid or hateful about Excel? Most times there seems to be a hidden motive. The critics do not really have an alternative solution. On the other hand some of the issues or risks of Excel usage are related to the process, data and sometimes people competences, profile and knwoledge, not the tool itself, therefore another tool does not really solve or tackle the root cause. Most alternatives do not really replace Excel and have the same features. In fact, from my experience, there is a shadow interest to sell other tools and solutions (e.g. data warehousing, analytics and reporting) by software vendors or to capture and gain access and ownership to aditional roles in the company (e.g. internal power struggle and agency costs). Usually there is not really a genuine interest to improve and solve the problem.

Excel capabilities include:
1. Structure and archive data - data warehousing
2. Connect to data sources - data warehousing
3. Link different data sources - data warehousing
4. Aggregate and filter data (queries) - data warehousing
5. Build data tables and graphics - reporting
6. Prepare graphs and diagrams - presentation
7. Mathematic and statistics - data analytics
8. Algorithms and programming - data analytics
9. Scripting and automation - computation
10. Full compatibility and universal access - compatibility
11. Available knowledge and support - know-how
12. Easy-to-use - know-how

This is really good taking into account Excel is a basic tool included in Office that almost everyone in every company has.