Quantcast
Channel: Excel and UDF Performance Stuff
Viewing all 94 articles
Browse latest View live

Parsing Functions from Excel Formulas using VBA: Is MID or a Byte array the best method?

$
0
0

As part of extending the performance profiling abilities of FastExcel, I wanted to develop a Function Profiler Map. A key component of this is to extract the names of the functions embedded in Excel formulas.

So I experimented with some different approaches:

  • using Rob Van Gelder’s AudXL formula parser
  • using the MID function to scan through character by character
  • using byte arrays

The Test Data

For performance testing I am using a large worksheet with 638K used cells, 103K constants and 413K formulas. There are 3 functions on this sheet (710K INDEX, 5 SUM and 51K IF).

For validity testing I am using a small worksheet with some tricky formulas:

ParseFormulas1

Rob Van Gelder’s AudXL formula parser

Rob converted a javascript-based Excel formula parser  written by Eric Bachtal to VBA. You can download AudXL.xla from here. It is a useful tool for breaking formulas apart so that they are more readable.

So I started with that as a performance baseline: it takes 130 seconds to parse the large test worksheet.

But of course a general purpose formula parser should be slower than a purpose-built parser that only finds the functions.

Rules for Finding Functions in Formula strings

  • Function names are always followed by open bracket “(“
  • Function Names are always preceded by an operator
    +-,/*(=><& :!^
  • Formula text enclosed in single or double quotes cannot contain a function

Assumptions

  • Formulas to be parsed will be obtained using Range.FormulaR1C1.
  • This means that the formula string will will use American English settings and native function names.
  • And I only have to parse formulas that have been validated by Excel.
  • The full range of Unicode characters are allowed, so I can’t just use ASCII codes.

Download the VBA Code

You can download the routines containing the VBA from here

The First attempt: using Byte arrays

In a previous post I discussed using Byte arrays as an efficient method for processing all the characters in a string.
This is a good method for handling Unicode strings – each character gets converted into 2 separate bytes (0 to 255) which uniquely define the character and cover all the possible National Language characters in the world.

So “(” for instance is always 40 and 0 and


Dim abFormula() as Byte
Dim str1 as string
str1="=NA()"
abFormula=str1


produces:

ParseFormulas2

Note that the Byte arrays produced by assigning a string to a byte array are always zero-based.

You can also do the reverse: assigning a byte array to a string converts back.

The First Algorithm

So here is an algorithm that follows the rules:

Scan each character in the byte array from left to right:
If the character is a single or double quote ignore all subsequent characters until there is another single or double quote.
if the character is in the list of operators (abStartChars) then set a flag
if the character is NOT in the list of operators check if its a “(” and
if it IS a “(” and there are one or more non-operator characters preceding it we have found a function name.

The code looks like this:


Function GetFunc1(abFormula() As Byte, abStartChars() As Byte, abEndChar() As Byte, _
 abQuotes() As Byte, abSQ() As Byte, jStart As Long, jEnd As Long) As String
 '
 ' search for a function name within a byte array starting at the jStart byte position
 ' byte array is zero based pairs of bytes
 ' abFormula is a byte array version of the formula
 ' abStartChars is a byte array containing characters that can precede the function name
 ' abEndChar is a byte array containing "("
 ' abQuotes is a byte array contianing a double quote
 ' abSQ is a byte array containing a single quote
 '
 ' returns name of function as a string and jEnd as the byte position of "("
 '
 Dim j As Long
 Dim k As Long
 Dim jStartChar As Long
 Dim jFirst As Long
 Dim blStart As Boolean
 Dim blString As Boolean
 Dim abFunc() As Byte
 '
 jFirst = jStart + 2
 blString = False
 For j = jStart + 2 To (UBound(abFormula) - 2) Step 2
 '
 ' skip text strings
 '
 If (abFormula(j) = abQuotes(0) And abFormula(j + 1) = abQuotes(1)) _
 Or (abFormula(j) = abSQ(0) And abFormula(j + 1) = abSQ(1)) Then
 blString = Not blString
 End If
 If Not blString Then
 '
 ' look for non startchar
 '
 blStart = False
 For jStartChar = 0 To UBound(abStartChars) Step 2
 If abFormula(j) = abStartChars(jStartChar) _
 And abFormula(j + 1) = abStartChars(jStartChar + 1) Then
 blStart = True
 jFirst = j + 2
 Exit For
 End If
 Next jStartChar
 If Not blStart Then
 If abFormula(j) = abEndChar(0) And abFormula(j + 1) = abEndChar(1) Then
 '
 ' we have a (
 '
 If j > jFirst Then
 '
 ' we have a function
 '
 jEnd = j
 '
 ' jend points to first byte of the ( character
 ' jfirst points to the first byte of the function name
 ' convert slice of formula to function name string
 '
 ReDim abFunc(0 To (jEnd - jFirst - 1)) As Byte
 For k = 0 To UBound(abFunc)
 abFunc(k) = abFormula(jFirst + k)
 Next k
 GetFunc1 = abFunc
 Exit Function
 '
 ElseIf abFormula(jFirst) = abEndChar(0) _
 And abFormula(jFirst + 1) = abEndChar(1) Then
 jFirst = jFirst + 2
 End If
 End If
 End If
 End If
 Next j
 End Function

And the driver routine just loops on all the formulas in the worksheet, calling the function parsing routine and storing/counting the functions found in a dictionary.


Sub testing1()
 Dim strFormula As String
 Dim strFunc As String
 Dim abFormula() As Byte
 Dim abStartChars() As Byte
 Dim abEndChar() As Byte
 Dim abQuotes() As Byte
 Dim abSQ() As Byte
 Dim jStart As Long
 Dim jEnd As Long
 Dim oFormulas As Range
 Dim oCell As Range
 Dim dTime As Double
 Dim dicFuncs As New Dictionary
 '
 ''' characters that can come before the start of a Function name
 Const strStartChars1 As String = "+-,/*=><& :!^" & vbLf
 '
 dTime = MicroTimer
 '
 abStartChars = strStartChars1
 abEndChar = "("
 abQuotes = Chr(34)
 abSQ = Chr(39)
 '
 dicFuncs.CompareMode = TextCompare
 '
 Set oFormulas = ActiveSheet.UsedRange.SpecialCells(xlCellTypeFormulas, 23)
 '
 For Each oCell In oFormulas
 strFormula = oCell.FormulaR1C1
 abFormula = strFormula
 jStart = 0
 jEnd = 0
 Do
 strFunc = UCase(GetFunc1(abFormula, abStartChars, abEndChar, abQuotes, abSQ, jStart, jEnd))
 If LenB(strFunc) = 0 Then Exit Do
 jStart = jEnd
 '
 ' add Function to dictionary and count occurrences
 '
 If dicFuncs.Exists(strFunc) Then
 dicFuncs(strFunc) = dicFuncs(strFunc) + 1
 Else
 dicFuncs.Add strFunc, 1
 End If
 Loop
 Next oCell
 '
 MsgBox MicroTimer - dTime
 End Sub

This version takes 48 seconds: better but with room for improvement.

The (nearly) Final Algorithm

  • I don’t need to worry about upper-lower case Function names, since Excel already takes care of that, so I can use the default Binary Compare for the dictionary
  • All the special characters for operators etc always have byte 2 =0 so I only need to test that once per character rather on each comparison.
  • If I first search left to right for the “(” character and then work backwards looking for an operator character the function does many fewer comparison operations, because there is only 1 end character but many start characters.

The resulting VBA code looks like this:


Function GetFunc3(abFormula() As Byte, abStartChars() As Byte, jStart As Long, jEnd As Long) As String
 '
 ' search for a function name within a byte array bstarting at the jStart byte position
 ' byte array is zero based pairs of bytes
 ' abFormula is a byte array version of the formula
 ' abstartchars is a byte array containing characters that can precede the function name
 '
 ' returns a string name of function and jEnd as the position of "("
 '
 Dim j As Long
 Dim k As Long
 Dim jj As Long
 Dim jStartChar As Long
 Dim jFirst As Long
 Dim blStart As Boolean
 Dim blDoubleQ As Boolean
 Dim blSingleQ As Boolean
 Dim abFunc() As Byte
 '
 jFirst = jStart + 2
 blDoubleQ = False
 For j = jStart + 2 To (UBound(abFormula) - 2) Step 2
 '
 ' start and end characters always have byte 2 =0
 '
 If abFormula(j + 1) = 0 Then
 '
 ' skip text strings
 '
 If abFormula(j) = 39 Then blSingleQ = Not blSingleQ
 If Not blSingleQ Then
 If abFormula(j) = 34 Then blDoubleQ = Not blDoubleQ
 If Not blDoubleQ Then
 '
 ' look for (
 '
 If abFormula(j) = 40 Then
 '
 ' we have a (
 ' look backwards for a startchar
 '
 blStart = False
 For jj = j - 2 To jStart Step -2
 For jStartChar = 0 To UBound(abStartChars) Step 2
 If abFormula(jj) = abStartChars(jStartChar) Then
 blStart = True
 jFirst = jj + 2
 Exit For
 End If
 Next jStartChar
 If blStart Then Exit For
 Next jj
 If blStart Then
 If j > jFirst Then
 '
 ' we have a function
 '
 jEnd = j
 Exit For
 ElseIf abFormula(jFirst) = 40 Then
 jFirst = jFirst + 2
 End If
 End If
 End If
 End If
 End If
 End If
 Next j
 If blStart And jEnd > jFirst Then
 '
 ' convert slice of formula to function name string
 ' jend points to first byte of the ( character
 ' jfirst points to the first byte of the function name
 '
 ReDim abFunc(0 To (jEnd - jFirst - 1)) As Byte
 For k = 0 To UBound(abFunc)
 abFunc(k) = abFormula(jFirst + k)
 Next k
 GetFunc3 = abFunc
 End If
 End Function

And the corresponding driver routine looks like this:


Sub testing3()
 Dim strFunc As String
 Dim abFormula() As Byte
 Dim abStartChars() As Byte
 Dim jStart As Long
 Dim jEnd As Long
 Dim oFormulas As Range
 Dim oCell As Range
 Dim dTime As Double
 Dim dicFuncs As New Dictionary
 '
 ''' characters that can come before the start of a Function name
 Const strStartChars2 As String = "+-,/*(=><& :!^" & vbLf
 '
 dTime = MicroTimer
 '
 abStartChars = strStartChars2
 Set oFormulas = ActiveSheet.UsedRange.SpecialCells(xlCellTypeFormulas, 23)
 '
 For Each oCell In oFormulas
 abFormula = oCell.FormulaR1C1
 jStart = 0
 jEnd = 0
 Do
 strFunc = GetFunc3(abFormula, abStartChars, jStart, jEnd)
 If LenB(strFunc) = 0 Then Exit Do
 jStart = jEnd
 If dicFuncs.Exists(strFunc) Then
 dicFuncs(strFunc) = dicFuncs(strFunc) + 1
 Else
 dicFuncs.Add strFunc, 1
 End If
 Loop
 Next oCell
 MsgBox MicroTimer - dTime
 End Sub

This version takes 11.6 seconds: but maybe I can make it faster?

Using MID$() instead of a Byte array

What happens if I forget about all this Byte array stuff and just implement the same algorithm using MID$() to extract each character from the formula and INSTR to check against the list of start operators (start characters)?

The GetFunc VBA code looks like this:


Function GetFunc4(strFormula As String, strStartChars As String, strEndChar As String, strQuotes As String, strSQ As String, jStart As Long, jEnd As Long) As String
 '
 ' search for a function name within a formula string starting at the jStart character position
 '
 ' strStartChars is a string containing characters that can precede the function name
 '
 ' returns a string name of function and jEnd as the position of "("
 '
 Dim j As Long
 Dim k As Long
 Dim jj As Long
 Dim jStartChar As Long
 Dim jFirst As Long
 Dim blStart As Boolean
 Dim blDoubleQ As Boolean
 Dim blSingleQ As Boolean
 Dim abFunc() As Byte
 Dim strChar As String
 Dim iStartChar As Long
 '
 jFirst = jStart + 1
 blDoubleQ = False
 For j = jStart + 1 To (LenB(strFormula) - 1)
 strChar = Mid$(strFormula, j, 1)
 '
 ' skip text strings
 '
 If strChar = strSQ Then blSingleQ = Not blSingleQ
 If Not blSingleQ Then
 If strChar = strQuotes Then blDoubleQ = Not blDoubleQ
 If Not blDoubleQ Then
 '
 ' look for (
 '
 If strChar = strEndChar Then
 '
 ' we have a (
 ' look backwards for a startchar
 '
 blStart = False
 For jj = j - 1 To jStart Step -1
 strChar = Mid$(strFormula, jj, 1)
 iStartChar = InStrB(strStartChars, strChar)
 If iStartChar > 0 Then
 blStart = True
 jFirst = jj + 1
 Exit For
 End If
 Next jj
 If blStart Then
 If j > jFirst Then
 '
 ' we have a function
 '
 jEnd = j
 '
 ' convert slice of formula to function name string
 ' jend points to first byte of the ( character
 ' jfirst points to the first byte of the function name
 GetFunc4 = Mid$(strFormula, jFirst, jEnd - jFirst)
 Exit Function
 ElseIf Mid$(strFormula, jFirst, 1) = strEndChar Then
 jFirst = jFirst + 1
 End If
 End If
 End If
 End If
 End If
 Next j
 End Function

Using MID$ and INSTR is slower: it takes 21.8 seconds.

Optimising the Driver Routine.

OK so I am reasonably happy with the parsing routine: I have gone from 130 seconds for a generic parsing routine to 11.6 seconds for the specialised Byte array routine.

But the driver routine looks at every single formula on the worksheet, and we know that on most large worksheets a large percentage of the formulas are copied.

So I can use the dictionary approach to find the distinct formulas (use R1C1 rather than A1 mode because copied formulas are identical in R1C1) and just parse those:

Testing3C also makes some more speed improvements by looping on each area and getting the formulas into a variant array rather than looping directly on the cells.


Sub testing3c()
 Dim strFunc As String
 Dim abFormula() As Byte
 Dim abStartChars() As Byte
 Dim jStart As Long
 Dim jEnd As Long
 Dim oFormulas As Range
 Dim oCell As Range
 Dim oArea As Range
 Dim vArr As Variant
 Dim vF As Variant
 Dim dTime As Double
 Dim dtotTime As Double
 Dim dicFuncs As New Dictionary
 Dim dicFormulas As New Dictionary
 Dim j As Long
 Dim k As Long
 '
 Const strStartChars2 As String = "+-,/*(=><& :!^" & vbLf ''' characters that can come before the start of a Function name
 '
 dTime = MicroTimer
 '
 abStartChars = strStartChars2
 Set oFormulas = ActiveSheet.UsedRange.SpecialCells(xlCellTypeFormulas, 23)
 '
 ' find distinct formulas in areas and count occurrences
 '
 For Each oArea In oFormulas.Areas
 '
 ' get a block of formulas
 '
 vArr = oArea.FormulaR1C1
 If IsArray(vArr) Then
 For k = 1 To UBound(vArr, 2)
 For j = 1 To UBound(vArr)
 If dicFormulas.Exists(vArr(j, k)) Then
 dicFormulas(vArr(j, k)) = dicFormulas(vArr(j, k)) + 1
 Else
 dicFormulas.Add vArr(j, k), 1
 End If
 Next j
 Next k
 Else
 If dicFormulas.Exists(vArr) Then
 dicFormulas(vArr) = dicFormulas(vArr) + 1
 Else
 dicFormulas.Add vArr, 1
 End If
 End If
 Next oArea
 '
 ' parse only the distinct formulas
 '
 For Each vF In dicFormulas
 abFormula = vF
 jStart = 0
 jEnd = 0
 Do
 strFunc = GetFunc3(abFormula, abStartChars, jStart, jEnd)
 If LenB(strFunc) = 0 Then Exit Do
 jStart = jEnd
 If dicFuncs.Exists(strFunc) Then
 dicFuncs(strFunc) = dicFuncs(strFunc) + dicFormulas(vF)
 Else
 dicFuncs.Add strFunc, dicFormulas(vF)
 End If
 Loop
 Next vF
 MsgBox MicroTimer - dTime
 End Sub

Now this takes only 3.5 seconds: adding a formula string to a dictionary is much faster than parsing it!

Conclusion

  • Byte Arrays can be significantly faster for character by character operations rather than using MID and INSTR.
  • Looking at how the first attempt at an algorithm works can give you clues about how to improve it.
  • Even using Byte arrays string parsing operations are slow in VBA.
  • Using a dictionary its really fast to find the distinct formulas even on a large worksheet.

Challenge

OK guys: who can write a faster VBA function parsing routine?

Download the routines from here



Exploring Range.Calculate and Range.CalculateRowMajorOrder: fast but quirky formula calculation

$
0
0

The Range.Calculate methods are very useful additions to Excel’s other calculation methods (Application level Calculate, CalculateFull, CalculateFullRebuild and Worksheet.calculate: the missing one is Workbook.Calculate!).

You can use the Range Calculate methods to:

  • Force calculation of a block of formulas or a single formula
  • See how long the variations of a particular formula take to calculate
  • Speed up repeated calculations

Download my RangeCalc Addin

You can download my RangeCalc addin from my website’s downloads page (xla password is dm).

This adds a button to the addins tab which uses Range.Calculate to time calculation of the currently selected cells.

Inspecting the RangeCalc code: different problems with different versions

You can unlock the xla to view the code using a password of dm.
The code in the RangeCalc sub bypasses a number of Range.calculate quirks in various Excel versions:


Sub RngTimer()
 '
 ' COPYRIGHT © DECISION MODELS LIMITED 2000,2001. All rights reserved
 '
 ' timed calculation of selected Range
 '
 ' bypass grouped and interactive problem 17/10/00
 ' remove interactive=false: Excel 97 Hangs when UDF error 14/2/01
 ' fix for application.iteration and array formulae with Excel2002 29/10/2001
 '
 Dim dRangeTime As Double
 Dim iMsg As Integer
 Dim blIter As Boolean
 Dim oCalcRange As Range ''' range to calculate
 Dim dOvhd As Double
 Dim strMessage As String
 '
 ' store iteration property
 '
 blIter = Application.Iteration
 '
 If ActiveWorkbook Is Nothing Or ActiveSheet Is Nothing Or ActiveWindow Is Nothing Or Selection Is Nothing Then
 Exit Sub
 Else
 If TypeName(Selection) = "Range" Then
 '
 ' if Excel2002 or greater handle iteration problem
 '
 If Left(Application.Version, 1) = "1" Then
 '
 ' switch off iteration
 '
 Application.Iteration = False
 End If
 '
 ' expand selected range to include all of any multicell array formula
 ' - makes Excel 2002 behave like earlier versions
 ' - allows notification if range has been expanded
 '
 Call ExpandRange(Selection, oCalcRange)
 '
 On Error GoTo errhandl
 '
 dOvhd = MicroTimer ''' ensure frequency is initialised
 dOvhd = MicroTimer ''' get time
 dOvhd = MicroTimer - dOvhd ''' calc microtimer overhead
 '
 dRangeTime = MicroTimer
 oCalcRange.Calculate
 dRangeTime = MicroTimer - dRangeTime - dOvhd
 '
 On Error GoTo 0
 '
 dRangeTime = Int(dRangeTime * 100000) / 100
 '
 ' 16/11/2009 - bypass multi-cell array formula problem
 '
 If Val(Application.Version) > 9 And Val(Application.Version) < 12 Then
 oCalcRange.Dirty
 End If
 '
 ' change message if array formula caused expansion of selection
 '
 If oCalcRange.Count = Selection.Count Then
 strMessage = CStr(Selection.Count) & " Cell(s) in Selected Range "
 Else
 strMessage = CStr(oCalcRange.Count) & " Cell(s) in Expanded Range "
 End If
 iMsg = MsgBox(strMessage & CStr(dRangeTime) & " Milliseconds", vbOKOnly + vbInformation, "RangeCalc")
 End If
 End If
 Application.Iteration = blIter ''' restore setting
 Set oCalcRange = Nothing
 Exit Sub
 errhandl:
 On Error GoTo 0
 Application.Iteration = blIter ''' restore setting
 Set oCalcRange = Nothing
 iMsg = MsgBox("Unable to Calculate Range", vbOKOnly + vbCritical, "RangeCalc")
 End Sub

Circular References

Using Range.Calculate on ranges that contain circular references within the range fails in Excel versions before Excel 2007.
In Excel 2007 and later Range.calculate only does a single iteration of the circular reference in Manual calculation mode, regardless of the Iteration settings.
So the RangeCalc addin switches iteration off whilst doing the Range.Calculate.

Multiple Sheets Selected

If you have multiple sheets selected Range.Calculate fails with a 1004 error, so the RangeCalc code has an error trap and message for any failure in Range.Calculate.

Multiple Areas selected  on a single Sheet

Range.Calculate will happily calculate a multi-area selection as long as all the areas are on the same sheet.

Multi-Cell Array formulas

If you do not select all the cells in a multi-cell array formula Range.Calculate will fail. My RangeCalc addin solves this problem by:

  • Automatically expanding the range to calculate to include all the cells in any array formula which intersects the selected range
  • Notifying the user that the range has been expanded

The VBA code to exapnd the range looks like this:


Sub ExpandRange(oStartRange As Range, oEndRange As Range)
 '
 ' COPYRIGHT © DECISION MODELS LIMITED 2000,2001. All rights reserved
 '
 ' Input: oStartRange, a range object that may or may not contain array formulae
 ' Output: oEndRange, a range object that has been expanded -
 ' to include all the cells in any array formula that is partly in the range
 '
 Dim oCell As Range
 Dim oArrCell As Range
 '
 ' loop on cells in oStartRange
 ' and expand range to include all the cells in any array formulae
 '
 On Error Resume Next
 '
 Set oEndRange = oStartRange
 For Each oCell In oStartRange
 If oCell.HasArray = True Then
 For Each oArrCell In oCell.CurrentArray
 '
 ' add any extra array cells
 '
 If Intersect(oEndRange, oArrCell) Is Nothing Then
 '
 ' if this cell is not in the expanded range then add it
 '
 Set oEndRange = Union(oEndRange, oArrCell)
 End If
 Next oArrCell
 End If
 Next oCell
 Set oCell = Nothing
 Set oArrCell = Nothing
 End Sub

There is also another problem with multi-cell array formulas and Range.Calculate, but it only exists in Excel 2002 and 2003 (after a Range.Calculate the array formula gets evaluated once for each cell it occupies in all subsequent recalculations). This problem is bypassed by using Range.Dirty on the range!

Note: The bug in Range.Dirty  is still there in Excel 2013. (it always works on the active sheet even when the range refers to another sheet!)

Range.Calculate and Range.CalculateRowMajorOrder – different handling of within-range dependencies

In early Excel versions (Excel 97 and 2000) Range.Calculate used a very simple calculation method: calculate the cells in each row in turn from left to right and ignore any forward references or within range dependencies. This method is fine as long as you know thats what it does and arrange your formulas accordingly (otherwise you may get incorrect results)!

But some people thought this was a bug, so it got fixed in Excel 2002 and 2003 (and later versions): Range.Calculate now starts by doing the left-to right calculation on each row in turn, and then starts recalculating any cells that refer to uncalculated cells within the range. In other words it achieves the same result as the standard Excel recalculation method.

The only problem was that this made Range.Calculate slower than in previous versions: and so some customers refused to upgrade because they could not run their bump runs fast enough!

So in Excel 2007 Microsoft solved the problem by introducing Range.CalculateRowMajorOrder. This method worked exactly the same way as the Excel 97 versions of Range.Calculate and was faster than the new Range.Calculate, and so everyone was happy except the VBA coders who had to work out when to use which method.

Some more Range.Calculate Limitations

Whilst the 2 Range Calculate methods are very useful, they do have some limitations:

  • They are both single-threaded calculation methods (In todays world this a serious limitation)
  • There is no keystroke sequence to initiate them from the UI (FastExcel uses Alt-F9 for this)
  • Re-entrant use of Range.Calculate is not allowed: for instance you can’t call Range.Calculate from inside a UDF
  • Range.Calculate works in US english dates etc.

Summary

  • Range.Calculate and Range.CalculateRowMajorOrder can be fast calculation methods
  • But they are not multi-threaded
  • For me they are essential tools for comparing formula speed
  • They need a bit of wrapping code, as in my RangeCalc addin, to make them generally useful.

Formula References between Sheets versus within Sheets shootout: Which calculates faster and uses more Memory

$
0
0

I thought I would revisit the differences between formulas that reference other worksheets and formulas that only reference their own worksheet. Referencing other worksheets always used to be a memory hog, but so much has changed between Excel 2003 and Excel 2013 that I wanted to see the current status.

The Test Workbooks

The test workbooks are all generated by simple VBA code contained in thw MakeInterLinkedSheets.xlsb workbook, which you can down load from here.

Generating Many Linked Worksheets

The code to generate the interlinked worksheets is shown below. You can choose how many worksheets to generate, and then each worksheet will contain a column of constants and a column of formulas that refer to each of the other worksheets. So if you choose 1500 worksheets each worksheet will contain 1500 formulas with every formula referring to a different worksheet (you can’t get much more linked than that!). Thats a total of 2.25 million formulas.


Sub MakeManyLinkedSheets()
 '
 ' make a large number of worksheets, each of which links to all of the others
 '
 Dim j As Long
 Dim k As Long
 Dim varSheets As Variant
 Dim nSheets As Long
 Dim nRequest As Long
 Dim nAdd As Long
 Dim var() As Variant
 '
 varSheets = Application.InputBox("Enter the Number of Interlinked Sheets to Generate", "Inter-Linked Sheets", 1500)
 If Not IsNumeric(varSheets) Then
 MsgBox "Input must be a number, MakeManyLinkedSheets cancelled", vbOKOnly + vbCritical
 Exit Sub
 Else
 Application.ScreenUpdating = False
 Application.Calculation = xlManual
 '
 nRequest = CLng(varSheets)
 '
 ' add sheets: cannot add more than 255 in one .Add statement
 '
 nSheets = ActiveWorkbook.Worksheets.Count
 nRequest = nRequest - nSheets
 Application.StatusBar = "Adding Sheets"
 Do While nRequest > 0
 nAdd = nRequest
 If nAdd > 255 Then nAdd = 255
 ActiveWorkbook.Worksheets.Add before:=ActiveWorkbook.Worksheets(nSheets), Count:=(nAdd)
 nSheets = ActiveWorkbook.Worksheets.Count
 nRequest = nRequest - nAdd
 Loop
 '
 ' add constant and linkage formula
 '
 For j = 1 To ActiveWorkbook.Worksheets.Count
 Application.StatusBar = "Generating Linkages on Sheet " & CStr(j)
 ReDim var(ActiveWorkbook.Worksheets.Count, 2)
 For k = 1 To ActiveWorkbook.Worksheets.Count
 var(k, 1) = j * k
 var(k, 2) = "=Sheet" & CStr(k) & "!a" & CStr(k)
 Next k
 Worksheets(j).Range("a1").Resize(ActiveWorkbook.Worksheets.Count, 2).Formula = var
 Next j
 Application.StatusBar = False
 Application.Calculation = xlAutomatic
 End If
 End Sub

Since you still (even in XL 2013) cannot create more than 255 sheets in a single Worksheets.Add command the code creates the worksheets in blocks of 255.

Memory Used & File Size

In old versions of Excel (97/2000) this code hit the memory wall at about 200 worksheets.
In Excel 2013 32-bit you can get up to over 2500 worksheets but 4000 fails at about 1.4 gigabytes.
In Excel 2013 64-bit I got to 5 gigabytes of memory trying for 4000 sheets but it was so slow I gave up.

For 1500 sheets:

  • Excel 2010 32 uses about 430 Megabytes of memory for the workbook
  • Excel 2013 32 uses about 540 Megabytes of memory for the workbook
  • Excel 2013 64 uses about 770 Megabytes of memory for the workbook
  • The workbook takes about 40 Megabytes when saved as an XLSB

Comparing Within Sheet References and Between Sheets References

Generating the Formulas

I used 3 different methods for generating the within-sheet reference formulas (in R1C1 mode):

Method 1 uses a formula that refers to the previous column on this row:
=Sheet1000!RC[-1]

Method 2 uses a formula that always refers to column 1 for this row:
=Sheet1000!RC1

Method 3 uses a formula that randomly refers to rows and columns (nRequest is the number of sheets requested):
"=Sheet1000!R" & Int(Rnd() * nRequest + 1) & "C" & Int(Rnd() * nRequest + 1)

The reason for using 3 different formulas is to see what effect different kinds of references have on memory and calculation speed.

The complete code for method 3 looks like this:

</pre>
Sub MakeManyFormulas3()
 '
 ' make a single worksheet that refers to itself
 ' generate pairs of columns:
 ' numeric constant followed by formula that refers to a random row and column
 '
 Dim j As Long
 Dim k As Long
 Dim varSheets As Variant
 Dim nSheets As Long
 Dim nRequest As Long
 Dim nAdd As Long
 Dim var() As Variant
 '
 varSheets = Application.InputBox("Enter the Number of Formulas to Generate", "Inter-Linked Sheets", 200)
 If Not IsNumeric(varSheets) Then
 MsgBox "Input must be a number, MakeManyLinkedFormulas cancelled", vbOKOnly + vbCritical
 Exit Sub
 Else
 Application.ScreenUpdating = False
 Application.Calculation = xlManual
 Application.ReferenceStyle = xlR1C1
 nRequest = CLng(varSheets)
 '
 ' add constant and linkage formula
 '
 For j = 1 To nRequest
 ReDim var(nRequest, 2)
 For k = 1 To nRequest
 var(k, 1) = j * k
 '
 ' refers to random row and column
 '
 var(k, 2) = "=Sheet1000!R" & Int(Rnd() * nRequest + 1) & "C" & Int(Rnd() * nRequest + 1)
 Next k
 Worksheets("Sheet1000").Range("a1").Resize(nRequest, 2).Offset(0, (j - 1) * 2).FormulaR1C1 = var
 Next j
 Application.StatusBar = False
 Application.Calculation = xlAutomatic
 Application.ReferenceStyle = xlA1
 End If
 End Sub

Timing and Memory Results

The memory used is the difference between before and after memory (Private Working Set) as measured by Windows 7 Task Manager.

The Full Calculate Time is the time taken for the second or third multi-threaded (4 cores – 8 threads) full calculation of all 2.25 million formulas as measured by FastExcel.

Conclusions

I must admit I was surprised about the calculation times: I thought they would be larger for the between sheets references than for the within sheets references. But there is no real noticeable difference: a larger factor is where the within-sheet formulas refer to, or more likely the total number of unique formulas used.
(Random >column to the left> always first column)

XL Version Formula Method Memory (MB)
Full Calc Time
2010 32 Interlink Sheets

426 MB

0.30 Seconds

2010 32 Previous Column

320 MB

0.27 Seconds

2010 32 First Column

214 MB

0.26 Seconds

2010 32 Random

286 MB

0.60 Seconds

 
2013 32 Interlink Sheets

538 MB

0.29 Seconds

2013 32 Previous Column

402 MB

0.29 Seconds

2013 32 First Column

300 MB

0.26 Seconds

2013 32 Random

369 MB

0.66 Seconds

2013 64 Interlink Sheets

835 MB

0.33 Seconds

My conclusions from all this are:

  • 64-bit Excel uses more memory than 32-bit Excel
  • Interlinking sheets uses more memory than within-sheet references.
  • There is no significant calculation time penalty in using inter-sheet references
  • Excel 2013 uses more memory than Excel 2010
  • The more unique formulas there are the more memory and calculation time is needed

Summit Winter 2013

$
0
0

On Saturday I got back from the winter 2013 MVP summit at the Microsoft campus in Redmond. The summit is a great chance to talk to the Microsoft Excel teams and to meet up with other Excel MVPs.
I also spent some time with the Excel Mac team and the Mac MVPs trying to understand the Excel development environment on the Mac, and time with the BI teams and SQL server MVPs.
Everything discussed is under strict NDA, so I can’t give you any details but it was all very interesting!

Surface 2

All the attending MVPs got a special offer on a 32Gb Surface 2 (RT). I must admit I had not planned on buying this because it does not support the development environment (VBA and Visual Studio) I need, but the offer was too good to refuse!

DSCF4206But after playing with it for a few hours I am very impressed. It comes with Excel, Outlook, Word and Powerpoint, 200 GB of Skydrive and a years worth of Skype. I added a 64GB micro SD card as the D: drive and a Bluetooth mouse.

For office use, with Email, web browsing and photos this is a great machine: you very quickly get using to touching the screen (and wishing my main laptop worked like that).

Geek Meet

Here are some snaps from the monday evening reception:

DSCF4166Roger Govier and Zak Barrese discussing the finer points of Apps for Office.

DSCF4151Patrick Matthews, Bob Umlas and Bill Jelen: Pivot Tables or Array Formulas?

DSCF4136Kevin (Zorvek) Jones does not look convinced by Zak(FireFyter)’s explanation of the Office 365 SKUs

DSCF4141Zak ignores Sam Rad and Ingeborg (Teylyn) Hawighorst showing off an orange something!

DSCF4152Felipe Gualberto and Ken Puls: which DAX function would you use to compare apples with oranges? Ken looks horrified: someone must have suggested VLOOKUP instead of CALCULATE!

DSCF4143Jacob Hildebrand and Jordan Goldmeier: Jordan shows what happens when you unwind Clippy – Jacob patiently waits for another seat to become free.

DSCF4156Roger Govier and Sam Rad say cheese for the camera whilst Ingeborg and Liam Bastick discuss the idea of arranging an Excel User Conference in OZ/NZ in 2015 with lots of MVP speakers (hopefully including me, Bill Jelen, Ken Puls, Bob Umlas …)

Mystery Photo

DSCF4002Here is this year’s mystery photo: exactly where can you find this dragon?


Bordeaux Wine Tasting: the ones that got away

$
0
0

Saturday evening (after I got back from Seattle: the Microsoft MVP Summit ) there was a major Bordeaux wine tasting event planned. My ex-neighbour Joe had initiated this, explaining that he had this bottle of Haut Brion that needed drinking.

But just as I arrived back on Saturday afternoon, somewhat jet-lagged (8 hours time difference), we heard that Joe had had to go into hospital, (he is out now) so the main event has had to be postponed to next year.

The ones that got away

Joe of course was being a bit economical with the truth: actually he had a whole case of Haut Brion and a bottle of Pichon-Longueville 1995.

DSCF4189DSCF4191My contribution was a bottle of Palmer 1996 and a bottle of Phelan Segur 1996.

DSCF4190So we left those bottles for another day and went with a somewhat lesser selection:

The wines we actually tasted

We were down to the four of us, me and Jane plus our son Ben and his wife Jo.
Ben and Jo got married in 1996 so their contribution was a bottle of 1996 Champagne:

DSCF4184So we started the evening with that, accompanied by some smoked salmon. The champagne was a strong golden colour with a delicious rich taste.

DSCF4183Then we tasted the 1996 Phelan Segur against a 2002 Segla (the second wine of Chateau Rausan-Segla) and a 2002 Art Series Leeuwin Estate cabernet sauvignon.

The Phelan Segur was disappointing: tasting notes say things like – dry, thin, lacking fruit, although the colour and clarity were good. We had decanted it 2 hours before, which I think was a mistake: we will try opening and immediately tasting the next bottle. It scored 13 out of 20.

The Segal was nice but not exceptional: tasting notes say things like – pleasant nose, nice fruit, a little thin. It scored 14 out of 20.

Everyone really liked the Leeuwin Estate. (well of course its actually a bit of a cheat since its from Margaret River in Western Australia rather than Bordeaux). Tasting notes say things like – subtle, nice finish, red currants. It scored 16 out of 20.
Unfortunately that was my last bottle!

So it was a great evening (I crashed at 10.30 pm) but we are really looking forward to the rematch next year!


Finding out if a function is Volatile or Multithreaded using VBA: UDFs for UDFs

$
0
0

Part of my new Profiling Formulas and Functions command requires the code to determine whether a Function is  a native built-in Excel function, or an XLL function, or some other kind (VBA, Automation). And I also want to know if its multi-threaded or volatile because that can have a significant effect on calculation performance.

So here is how to do it!

Built-in Native Functions

You can get a list of the functions that are built-in to Excel from the Excel XLL SDK. I added 2 columns to the list showing whether the functions are Multi-Threaded or Volatile:

Native_FuncsThen I can use VLOOKUP to find out if the function is a built-in Excel function, and if so whether it is Volatile or Multi-threaded.

XLL Functions

XLL functions have to be registered using the REGISTER C-API command. When registering the function you have to include a typestring that declares what type each of the function arguments is, and whether the function is multi-threaded or volatile.

And it turns out there is a VBA method to get the typestrings for all registered XLL Functions.
Application.RegisteredFunctions returns a 3-column array containing the name of the XLL file, the Function name and the function typestring.

But of course it is not quite so easy as that. The Function name it returns is the internal function name in the XLL code, which is usually not the same as the name of the UDF function as used by Excel!
So for example a UDF name like REVERSE.TEXT could have an XLL internal name of f1.

Using Register.ID to match up the Names

The way to find out which XLL function name (if any) corresponds to the Excel UDF Name is to find the internal number of the function (its Register ID).

For an XLL UDF this is the number returned if you enter the Excel UDF name in a formula without any () after it (for example =REVERSE.TEXT ). And you can get this in VBA using EVALUATE:

vExcelFuncID = Evaluate(strFunc)

Getting the Register Id number of the internal XLL function name requires calling an XLM Macro command called REGISTER.ID using VBA Application.ExecuteExcel4Macro.

Then if both Register.IDs match you have an XLL function and you can look in the typestring to see if its multi-threaded (the typestring contains $) or volatile (typestring contains #).

Other UDF Types (VBA, Automation …)

If the UDF is not built-in and not a registered XLL function it must be either a VBA or an Automation UDF (or an XLM UDF!). None of these can be multi-threaded.

But they can be volatile and I have not found a straight-forward way of determining this programmatically. The simplest way to find out is to put the UDF in a formula and see if the formula recalculates with every F9: but you can’t do that from a VBA UDF!

The VBA Code

You can download the code as an XLAM from here.

There are 2 Subroutines to get the Native function lists and registered XLL functions into module level arrays:

Option Explicit
Option Base 1
Dim vFuncRegister As Variant
Dim vNative As Variant

Private Sub GetNative()
'
' get the list of native built-in functions and their attributes
'
If IsEmpty(vNative) Then
vNative = ThisWorkbook.Worksheets("NativeFuncs").Range("A2").Resize(ThisWorkbook.Worksheets("NativeFuncs").Range("A1000").End(xlUp).Row - 1, 3)
End If
End Sub
Private Sub GetFuncRegister()
'
' get list of registered functions and their funcIDs
'
Dim sCmd As String
Dim vRes As Variant
Dim j As Long
'
If Not IsEmpty(vFuncRegister) Then Exit Sub
'
vFuncRegister = Application.RegisteredFunctions     ''' get data on XLL registered functions
'
' add column for funcid
'
ReDim Preserve vFuncRegister(LBound(vFuncRegister) To UBound(vFuncRegister), 1 To 4) As Variant
'
For j = LBound(vFuncRegister) To UBound(vFuncRegister)
'
' get funcids
'
If vFuncRegister(j, 1) Like "*xll" Then
sCmd = "REGISTER.ID(""" & CStr(vFuncRegister(j, 1)) & """,""" & CStr(vFuncRegister(j, 2)) & """)"
vRes = Application.ExecuteExcel4Macro(sCmd)
If Not IsError(vRes) Then vFuncRegister(j, 4) = vRes
End If
Next j
End Sub

The main function checks the function name against the native functions array, then if not found, checks it against the registered XLL functions array and if not found assumes it must be a VBA or Automation (or XLM) UDF.


Private Function CheckFunc(strFunc As String, blMulti As Boolean, blVolatile As Variant) As String
'
' returns
' type = B if built-in, X if XLL else O for Other (VBA or Automation)
' blMulti is true if multithreaded
' BlVolatile is True if Volatile, False if not volatile and ? if don't know
'
Dim strType As String
Dim vFound As Variant
Dim j As Long
Dim strTypeString As String
Dim vExcelFuncID As Variant
'
blMulti = True
blVolatile = False
'
' check for native xl function
'
On Error Resume Next
vFound = Application.VLookup(strFunc, vNative, 1, False)
On Error GoTo 0
If Not IsError(vFound) Then
strType = "B"
If Application.VLookup(strFunc, vNative, 3, False) = "V" Then blVolatile = True
If Application.VLookup(strFunc, vNative, 2, False) = "S" Or Val(Application.Version) < 12 Then blMulti = False
End If
'
If Len(strType) = 0 Then
'
' get xlfuncid - if not error then its an XLL func
'
vExcelFuncID = Evaluate(strFunc)
If Not IsError(vExcelFuncID) Then
strType = "X"
For j = LBound(vFuncRegister) To UBound(vFuncRegister)
If strFunc = vFuncRegister(j, 2) Or vExcelFuncID = vFuncRegister(j, 4) Then
strTypeString = vFuncRegister(j, 3)
If InStr(strTypeString, "!") > 0 Or _
(InStr(strTypeString, "#") > 0 And (InStr(strTypeString, "R") > 0 Or InStr(strTypeString, "U") > 0)) _
Then blVolatile = True
If InStr(strTypeString, "$") = 0 Or Val(Application.Version) < 12 Then blMulti = False
Exit For
End If
Next j
End If
End If
'
If Len(strType) = 0 Then
'
' else its Other (VBA or Automation)
'
strType = "O"
blMulti = False     ''' cant be multi
blVolatile = "?"    ''' don't know if volatile
End If
'
CheckFunc = strType
End Function

Then there are 3 UDFs to find out if the UDF is Volatile, is Multi-threaded and what type of UDF it is.

Public Function IsMultiThreaded(strFuncName As String) As Variant
'
' check if a function is Multi-Threaded
' Returns true or false
'
Dim blMulti As Boolean
Dim blVolatile As Variant
Dim strType As String
'
GetNative
GetFuncRegister
strType = CheckFunc(strFuncName, blMulti, blVolatile)
'
IsMultiThreaded = blMulti
End Function
Public Function IsVolatile(strFuncName As String) As Variant
'
' check if a function is volatile
' returns True or False or ? if don't know
'
Dim blMulti As Boolean
Dim blVolatile As Variant
Dim strType As String
'
GetNative
GetFuncRegister
strType = CheckFunc(strFuncName, blMulti, blVolatile)
'
IsVolatile = blVolatile
End Function
Public Function FuncType(strFuncName As String) As Variant
'
' get type of function: B for built-in Excel, X for XLL, O for Other

Dim blMulti As Boolean
Dim blVolatile As Variant
Dim strType As String
'
GetNative
GetFuncRegister
FuncType = CheckFunc(strFuncName, blMulti, blVolatile)
End Function

Note: Yes – the production version of this code is more optimised for speed, but this version is easier to understand!

Summary

When you are trying to optimise Excel calculation speed its important to know which functions are multi-threaded or volatile.

This post demonstrates a way of doing this programmatically.

Limitations:

There are some limitations of this method:

  • Cannot determine if VBA and Automation UDFs are volatile.
  • Will not detect XLL functions that internally change volatility programmatically.

Any ideas on how to determine the volatility of VBA UDFs will be gratefully received!


Excel 2013 SDI Bug: “Calculate” in Status Bar strikes again

$
0
0

In the old days (I’m talking Excel 5 to Excel 2003 here) there was a worrying situation you could find yourself in where, no matter what you did, Excel would show you “Calculate” in the statusbar.
Even when actually nothing needed calculating.
You could press F9, Ctrl/Alt/F9 or even Shift/Ctr/Alt/F9 until the cows came home, but you could not get rid of that pesky “Calculate”.

Excel 2007 to the rescue

Then along came Excel 2007 and solved the problem.

For details of the circumstances causing “Calculate” to appear in the statusbar see http://www.decisionmodels.com/calcsecretsf.htm

Excel 2013 and Calculate in the statusbar

Having just spent the best part of 2 days sorting out what looked like a problem with a lot of complicated VBA FastExcel code I have just discovered that its not a FastExcel problem at all: its an Excel 2103 SDI Bug (Single Document Interface – each window is separate from the others. As opposed to MDI -Multiple Document Interface where all the windows are within a parent window – as in all previous Excel versions).

Basically what should happen is very simple:

  • When Excel detects that something needs to be calculated it puts the “Calculate” message in the statusbar. (This corresponds to VBA Application.CalculationState=xlPending)
  • Whilst Excel is calculating it shows “Calculating x%” in the statusbar (Application.CalculationState=xlCalculating)
  • When Excel has finished calculating it removes the messages from the statusbar (Application.CalculationState=xlDone)

There are a few exceptions to this:

  • If Workbook.ForceFullCalculation is true then the statusbar always shows Calculate
  • If there are circular references and Iteration is switched on then the statusbar always shows Calculate
  • If your VBA has taken control of the statusbar then you don’t see this happening until you relinquish control
  • If you are using Automatic Calculation mode then the switch into and out of “Calculate” status does not happen.

Unfortunately the Excel 2013 SDI implementation currently only does all this to the active window: so if you have multiple workbooks open things can rapidly get in a mess:
just follow this simple recipe:

Start Excel 2013 with an empty workbook (Book1) and in automatic calculation mode.

Add a (non-volatile) formula to cell A1 (=99)

Open another empty workbook (Book2)

Switch to Manual calculation Mode

Add a (non-volatile) formula to cell a1 in Book2 (=55)

The statusbar in book2 now shows “calculate” but the one in Book1 does not (Bug)

statusbar1Now switch to the Book1 window and press F9. The Calculate message in Book2 should go away: but it does not (Bug)

Statusbar2Now you can go back to the Book2 window and press F9 again, but that pesky “calculate” will NOT go away.
If you change the formula in cell A1 (or do anything else that makes the workbook uncalculated) and then press F9 the “calculate” goes away!

You can even make this happen in Automatic mode.

Repeat the steps above, but when you have switched to Book1 with book2 showing calculate, instead of pressing F9 switch to Automatic calculation mode.
Now switch back to Book2 and no matter what you do you cannot get rid of “Calculate” (except by going back to Manual, dirtying Book2 and recalculating).

Conclusion

Moving Excel 2013 to SDI is the only way to get sensible support for multiple screens, and so was probably neccessary.
But it has caused quite a lot of breakage of existing applications (Toolbars, Modeless Forms etc)

I hope Microsoft can fix this Calculate problem in the forthcoming SP1 release before it bites too many people!


2013 in review

$
0
0

The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 85,000 times in 2013. If it were an exhibit at the Louvre Museum, it would take about 4 days for that many people to see it.

Click here to see the complete report.



Inserting a UDF and Launching the Function Wizard from VBA

$
0
0

In a previous post I suggested you could do this by entering the function using VBA into the selected cell using a comma as the argument to the function, for example =LISTDISTINCTS.SUM(,) , and then calling the function wizard using Application.Dialogs(xlDialogFunctionWizard).Show.

This method mostly works but has 2 major drawbacks:

  • It won’t allow you to use F4 within the function wizard to change from relative to absolute.
  • You get #Value or #Name when launching an XLL-based UDF with only 1 argument.

After a lot of trial and error I eventually found a way that seems to work in all cases:

Sub GoFuncWiz(strFunc As String, blArray As Boolean)
On Error Resume Next
If blArray Then
Selection.FormulaArray = "=" & strFunc & "()"
Else
Selection.Formula = "=" & strFunc & "()"
End If
Application.OnTime Now + TimeValue("00:00:01") / 4, "RangeFuncWiz"
End Sub
Sub RangeFuncWiz()
On Error Resume Next
Selection.FunctionWizard
End Sub

The simple (once you know how) solution is to launch the Range.FunctionWizard method, but with a quarter-second delay!
Oh, and you don’t need the dummy comma argument either.


Threading and Hyper-Threading: Optimizing Excel Calculation speed by changing the number of threads

$
0
0

I have just implemented measuring multi-threaded calculation efficiency in FastExcel V3 Profile Workbook: so I thought it would be interesting to see the effect on calculation speed of varying the number of threads, and switching off hyper-threading.

Hyperthreading

Almost all of todays PCs have multiple cores. My desktop has an Intel I7 870 chip. This contains 4 hyper-threaded cores. Hyper-threading is an Intel hardware feature that allows each core to look to the software as though it is 2 cores. If one of the hyperthreads on a core gets stalled waiting for data the other hyperthread takes control of the resources and tries to continue. How efficient this is is highly dependent on the software application, and I had no idea whether Excel made good use of hyper-threading or not.

Because hyper-threading is a hardware feature you can only switch it off or on using the BIOS.

Excel Multi-threaded Calculation

Microsoft did a great job of implementing multi-threaded calculation in Excel 2007. The way this works is

  • Excel analyzes the calculation chains of formula dependencies looking for lengths of chain that can be processed in parallel
  • Starts a number of separate calculation threads
  • Assigns the lengths of chain to the calculation threads for processing
  • And finally gathers everything together

How successful this process is depends very much on the structure and linkages between your formulas, whether the functions you are using are multi-threaded etc, and this can vary enormously between workbooks.

You can control the number of threads that Excel assigns to the calculation process using File–>Options–>Advanced Options–>Formulas. The number of threads assigned can be larger or smaller than the number of logical cores available.

The Tests

I picked three largeish workbooks:

  • TTG – 4.7 million formulas using 704 Megabytes of workbook memory
  • GL – 466 thousand formulas using 114 Megabytes of workbook memory
  • PF2 – 284 thousand formulas using 700 Megabytes of workbook memory

I used my main desktop machine with Windows 7 32 bit and Excel 2010 32-bit.
This PC has an Intel Core I7 870 2.93 GHZ with 4 hyper-threaded physical cores, so 8 logical cores.

I ran a full calculation on each of the 3 workbooks using 1 to 8, 12, 24, 64 and 256 calculation threads with hyper-threading enabled. Then I switched off hyper-threading and ran a full calculation with 4 threads (so matching the physical cores) and 8 threads.

Timing Results with Hyper-threading enabled

Thread_TableTimings are in seconds for the second or third successive full calculation (to enable comparable optimization of the calculation process).

TTGGLPF2As you can see, all three workbooks show different characteristics:

  • TTG and PF2 are both fastest with 8 threads but GL is fastest with only 3 threads.
  • Increasing the number of threads beyond the number of logical cores available slows down calculation, but the increase is much less marked with PF2 than with the other 2 workbooks.
  • Increasing the number of threads beyond the number of physical cores up to the number of logical cores improves speed for TTG and PF2: so hyperthreading is successfully exploited by Excel. But physical cores are much more effective than logical cores.

The effect of disabling Hyperthreading

When I disabled hyperthreading using the BIOS:

  • Calculation using 4 threads ran slightly faster for TTG (88 vs 93) and PF2(7.6 vs 7.7) but slower for GL (0.49 vs 0.48)
  • Calculation using 8 threads ran slightly slower for all 3 workbooks (TTG 89 vs 73, PF2 7.7 vs 6.5, GL 0.58 vs 0.55)

So the effect of hyperthreading is noticeable but not large.

Running out of memory

There have been threads in the newsgroups about multi-threaded calculation causing Excel to give warning messages about running out of resources whilst calculating. I have not been able to duplicate this problem even with a large workbook like TTG running with 256 threads. The suggested solutions are either to turn off hyper-threading using the BIOS or to reduce the number of threads from 8 to 6.

Conclusions

  • Excel’s multi-threaded calculation can be very successful at reducing calculation times.
  • The effect of hyper-threading is not as large as that from multiple physical cores, but its still worth having.
  • The effectiveness of multi-threading is very dependent on the workbook. There will be workbooks where the overhead of analyzing the calculation chains outweighs the gain in calculation speed.
  • Excel’s default setting to assign a calculation thread to all available logical cores seems sensible.

The Amsterdam Excel Summit

$
0
0

Amsterdam

A Unique Opportunity

Dear Excel lovers,

Is Excel the first and last application you use every day?
Do you want to improve your Excel skills and get first-hand knowledge from the absolute best Excel experts in the world?
Then this unique event is for you.

The Amsterdam Excel Summit

Worldclass Excel Experts

An absolute unique group of Excel MVP’s will gather in Amsterdam to share their expert knowledge with you. The Excel MVP’s happen to be in Amsterdam for a meeting and we’ve succeeded in getting some of them to present at our event. There is not much chance on this happening again anytime soon, so make sure you register!

My own session will (of course) be about speeding up Excel calculations:

How to make Excel Calculate Your Workbooks Faster

  • Excel’s smart calculation engine
  • The impact of Volatility and Multi-threading
  • Fan-out and Short-circuiting
  • User-defined functions and Array formulas
  • Lookups and SUMPRODUCT
  • Finding Calculation bottlenecks
  • Golden Rules for Faster Calculation

So come along and:

  • Ask questions and get answers from the assembled Excel gurus
  • Meet like-minded Excel Geeks

Making sense of complex Formulas: an Indenting Viewer-Editer

$
0
0

Some time ago I was working with a client to speed up one of their workbooks.
I was using the FastExcel V3 formula profiler and it showed that one formula was taking a significant proprtion of the calculation time.

And the formula was too complicated to easily understand. So I decided that FastExcel V3 really needed a better way of understanding, creating and modifying formulas and started developing one. It currently looks like this:

Indent1Of course the trouble with creating a formula indenter is:

  • Nobody agrees what the “correct” indentation style is!
  • And anyway what works well for one formula does not neccessarily work well for another formula.

So I added the ability to dynamically switch indentation styles: for me splitting the OR(…) section by commas makes it easier to read -

Indent2The Viewer-Editer also helps you debug the formula by showing you the result of the selected portion of the formula and by making it easy and fast to jump to and select different parts of the formula.

Indent3The Select options work in conjunction with the navigate arrows (Next left, This, Next right, Expand selection, Contract selection).
So if you click the right arrow with Functions selected the selection jumps to the next function on the right and shows you the result in the evaluate box.

Indent4Modifying the Formula

You can modify the formula by directly editing the formula text and there are also many of the familiar Excel tools built-in:

  • Function Wizard
  • Insert a Reference or a Defined Name
  • Change a reference from Relative to Absolute (F4)
  • Build up a Mega-Formula by inserting a copy of another formula

Indent6

Clicking the Function Wizard button when a function is selected brings up the function wizard for that function so that its easy to change:

Indent5but if nothing is selected then the Function Wizard is called allowing you to choose a function, enter its parameters, and have it inserted at the current insertion point.

Conclusion

I have added quite a lot of function to the viewer-editer since the original concept, but I am sure it can be improved further.

So please download FastExcel V3 Beta 3, try it out and let me have your comments.

 

 


FastExcel V3 Released with Introductory Offer

$
0
0

FastExcel has been used successfully by thousands of users since it was first launched in 2001. The last version 2.4 was introduced in 2008 and since that time there have been major changes to Excel with Excel 2007, 2010 and 2013, including 64-bit Excel and Multi-threaded calculation.

FastExcel Version 3 is a major upgrade of the successful V2.4 product and has been under development for several years.

Special Introductory Offer – 50% off the FastExcel V3 Bundle

You can get a special launch offer of a 50% discount on the FastExcel V3 Bundle (all the FastExcel V3 products) as long as you buy a license before the end of July 2014.

The FastExcel V3 Family of Products

There are 3 major products in the FastExcel V3 family which are targeted at different types of useage. The aim is to allow you to only buy the tools you need.

FastExcelV3

FastExcel V3 Profiler

The Profiler gives you a comprehensive set of tools focussed on finding and prioritising calculation bottlenecks. If your spreadsheet takes more than a few seconds to calculate you need FastExcel profiler to find out and prioritize the reasons for the slow calculation

  • Profiling Drill-Down Wizard – the easy way to find bottlenecks
  • Profile Workbook – profiles worksheet calculation times, volatility, multi-threaded efficiency and memory useage.
  • Profile Worksheet – profiles worksheet areas, columns and row including conditional formats
  • Profile Formulas and Functions – profiles each unique formula’s calculation time and function types
  • Map Cross-References – shows and counts the links between worksheets.

FastExcel V3 Manager

FastExcel Manager contains tools to help you build, debug and maintain Excel workbooks.

  • Name Manager Pro – an invaluable tool for managing Defined Names and Tables
  • Formula Viewer/Editer – a better way of editing and debugging more complex formulas.
  • Sheet Manager – Easily manage and manipulate worksheets.
  • Workbook – Cleaner – Trim down any excess bloat in your workbooks
  • Where-Used Maps – See where your Defined Names, Number Formats and Styles are being used

FastExcel SpeedTools

SpeedTools provides you with a state-of-the-art tool-kit to help you speed up your Excel Calculations

  • Calculation timing tools for workbooks, worksheets, and ranges
  • Additional calculations modes to enhance control of calculation so that you only calculate what needs to be calculated.
  • 90 superfast multi-threaded functions
  • Faster and more powerful Lookups and List comparisons
  • Multi-condition filtering and Distinct formulas to eliminate many slow SUMPRODUCT and Array formulas
  • Enhanced functions for Array handling, text, mathematics, sorting, information and logic

Try it out for yourself:

Download the 15-day full-featured trial of FastExcel V3 build 215.642.755.317

Note: the trial version of FastExcel V3 profiler does not enable Profile Workbook, Profile Worksheet and Profile Formulas, and the Drill down wizard will only profile a single worksheet.

You can convert the trial version of FastExcel V3 to a fully licensed version at any time by purchasing one of the FastExcel V3 licensing options.

Want to know more?

Download the FastExcel V3 Help file or the FastExcel V3 User Guide (PDF)
(you may need to unblock the downloaded .CHM file – Right-Click->Properties->Unblock)


VBA searching shootout between WorksheetFunction.Match: Performance Pros and Cons

$
0
0

It is a rainy day here in Norfolk so …

Prompted by a claim that searching using a variant array was much faster than calling MATCH from VBA I thought it was time for a detailed performance analysis.
I shall try to discover when it is better to use the array method rather than MATCH and whether the performance penalty of a wrong choice is significant.

An earlier post match-vs-find-vs-variant-array-vba-performance-shootout looked at doing a slightly more complex search. This time I will use a simpler test case and concentrate on determining the influence of data size, the length of the search within the data and the overhead of each method.

Test Setup

The test setup is very simple: column A1:A100000 contains a sequence of numbers from 1 to 100000.
Each different method is tested using a variety of sizes of rows and a variety of search depth percentages.

Match_Array1So for example 25% search depth of 400 rows means that the search will look 100 rows deep in the 400.

I am testing with Excel 2010 32-bit mainly (with some Excel 2013 32-bit comparisons) under Windows 7 Pro 64 bit.
My PC is a Dell Latitude E6530 with 8GB memory and a quad core i7-3720QM 2.60 GHZ. However since all the tests are VBA based only one core will be used.

Methods Tested

I am testing 6 different methods.

  1. Linear search using a For loop on a variant array created from the resized range in Column A
  2. Linear search using WorksheetFunction.MATCH with the unsorted option directly on a range created from Column A.
  3. Linear search using Application.MATCH with the unsorted option directly on a range created from Column A.
  4. Linear search using WorksheetFunction.MATCH with the unsorted option on a variant array created from the resized range in Column A
  5. Binary search using WorksheetFunction.MATCH with the sorted option directly on a range created from Column A.
  6. Cell by Cell loop directly on a range created from Column A.

The VBA Code

The VBA code is designed as a main subroutine returning a variant array of results to a driver sub.

Each search method is embedded in 3 loops:

  • Loop on Range Size (Number of Rows)
    • Loop on % Search Depth (How far to traverse within the range)
      • Loop on Trails (each test is timed 5 times and the median of the 5 times is used)

Timing is done using the MicroTimer Windows high-resolution timer.


Sub DoSearchTests()
Dim j As Long
Dim vResults() As Variant
'
' call each method in turn
'
For j = 1 To 6
SearchTimer vResults(), j
Worksheets("Sheet1").Range("E4").Offset((j - 1) * 14, 0).Resize(UBound(vResults), UBound(vResults, 2)) = vResults
Next j
End Sub
Sub SearchTimer(vResults() As Variant, jMethod As Long)
'
' 6 Search Methods:
'
' 1 Linear search of variant array
' 2 Linear WorksheetFunction.MATCH on Range
' 3 Linear Application.MATCH on Range
' 4 Linear WorkSheetFunction.Match on Array
' 5 Binary Search Match on Range
' 6 Cell by Cell search
'
Dim vArr As Variant
Dim j As Long
Dim i As Long
Dim jF As Long
Dim jRow As Long
Dim jTrial As Long
Dim dTime As Double
Dim NumRows As Variant
Dim SearchDepth As Variant
Dim vTrials() As Variant
Dim rng As Range
Dim SearchVal As Variant
''' initialise
NumRows = Names("nRows").RefersToRange.Value2
SearchDepth = Names("SearchDepth").RefersToRange.Value2
ReDim vResults(LBound(NumRows) To UBound(NumRows), LBound(SearchDepth, 2) To UBound(SearchDepth, 2))
dTime = MicroTimer
With Worksheets("Sheet1")
''' loop on number of rows
For i = LBound(NumRows) To UBound(NumRows)
''' loop on % search depth
For jF = LBound(SearchDepth, 2) To UBound(SearchDepth, 2)
''' derive search value as a % of the number of rows
SearchVal = Int(SearchDepth(1, jF) * NumRows(i, 1))
If SearchVal < 1 Then SearchVal = 1
''' find the median time of 5 trials
ReDim vTrials(1 To 5)
For jTrial = 1 To 5
''' timing loop
If jMethod = 1 Then
''' get array and loop for search
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
vArr = rng.Value2
For j = LBound(vArr) To UBound(vArr)
If vArr(j, 1) = SearchVal Then
jRow = j
Exit For
End If
Next j
dTime = MicroTimer - dTime
ElseIf jMethod = 2 Then
''' use linear WorksheetFunction.Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = WorksheetFunction.Match(SearchVal, rng, 0)
dTime = MicroTimer - dTime
ElseIf jMethod = 3 Then
''' use linear Application.Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = Application.Match(SearchVal, rng, 0)
dTime = MicroTimer - dTime
ElseIf jMethod = 4 Then
''' use linear WorksheetFunction.Match on an array from the range
dTime = 0#
If NumRows(i, 1) <= 65536 Then
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
vArr = rng.Value2
jRow = WorksheetFunction.Match(SearchVal, vArr, 0)
dTime = MicroTimer - dTime
End If
ElseIf jMethod = 5 Then
''' use binary search Match on the range
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
jRow = WorksheetFunction.Match(SearchVal, rng, 1)
dTime = MicroTimer - dTime
ElseIf jMethod = 6 Then
''' get cell value and loop
dTime = MicroTimer
Set rng = .Range("a1:A" & CStr(NumRows(i, 1)))
For Each vArr In rng
If vArr = SearchVal Then
jRow = j
Exit For
End If
Next vArr
dTime = MicroTimer - dTime
End If
''' store timings for trials
vTrials(jTrial) = dTime * 1000000
Next jTrial
''' get median of the trials
vResults(i, jF) = WorksheetFunction.Median(vTrials)
Next jF
Next i
End With
End Sub

 Test Results

All timings are in Microseconds (millionths of a second).

Looping the Variant Array from a Range (Method 1)

XL2010_Method1The timings for this method show:

  • An overhead of 10 Microseconds per call
  • The 0% column is a good approximation to the time taken to read the range into a variant array. This increases with the size of the range.
  • Search time could be calculated by subtracting the 0% column from the other columns and as expected increases with the number of cells  being traversed.

Using WorksheetFunction.Match on Ranges (Method 2)

XL2010_Method2

The timings for this method show:

  • An overhead of 13 Microseconds per call (larger than the array method)
  • The 0% column is constant because no data is transferred from Excel to VBA..
  • Search time less overhead increases with the number of cells  being traversed but is lower than the array method.

Using Application.Match on Ranges (Method 3)

XL2010_Method3

I added this method to compare using Application.MATCH with WorksheetFunction.MATCH.
The timings for this method show:

  • An overhead of 16 Microseconds per call (larger than the WorkSheetFunction method)
  • The 0% column is constant because no time is taken to transfer the data from Excel to VBA
  • Search time less overhead increases with the number of cells  being traversed and is very similar to the WorkSheetFunction method.

Using WorksheetFunction.Match on an array derived from a Range (Method 4)

XL2010_Method4

The timings for this method show:

  • An overhead of 15 Microseconds per call (larger than both the array and Match methods)
  • The 0% column increases sharply with data size and is larger than Method 1 because the data is transferred from Excel to VBA and back to the MATCH function.
  • Search time less overhead increases with the number of cells  being traversed but is lower than the array method.
  • The 100000 row is missing because this method only allows a maximum of 65536 rows before the MATCH function  fails.

Using the Binary Search (sorted) option of WorksheetFunction.Match on Ranges (Method 5)

XL2010_Method5WOW!!! That is seriously fast.

  • The overhead is comparable to Linear Search WorksheetFunction.MATCH (Method 2)
  • The 0% column is constant because no data is transferred from Excel to VBA.
  • Search time is close to zero even over 100000 rows because of the efficiency of the Binary Search method being used.
  • The data has to be sorted for this to work
  • If there is missing data it would need to be handles with the double VLOOKUP/MATCH trick.

Using Cell by Cell search (Method 6)

XL2010_Method6The timings for this method show:

  • The overhead is comparable to Method 1 but the 0% column does not increase with increasing data size because only the minimum amount of data is being transferred from Excel to VBA.
  • For large amounts of data this method is extremely inefficient, but for small (20-30 roes) volumes you will not usually notice it.

Breakeven between the Array Method (1) and the WorksheetFunction.MATCH method (2)

Excel2010Compare

This table shows the ratio of method 1 timings to method timings, so anything >1 shows that method 2 is faster.

  • For large amounts of data MATCH is significantly faster than the array method.
  • The breakeven point for Excel 2010 32-bit is around 40 rows.
  • The traversing search speed of MATCH is faster than the array method and data does not need to be transferred from Excel to VBA.
  • But the overhead of the Array method is lower than that of Worksheetfunction.Match

Comparing Excel 2013 with Excel 2010

Excel2013CompareWith Excel 2013 the breakeven point is slightly more in favour of MATCH than with Excel 2010.

Excel 2013 Method 1 (Variant array search)

XL2013_Method1

Excel 2013 Method 2 (WorksheetFunction.Match)

XL2013_Method2

 Conclusions

  •  Both the Array method and WorksheetFunction method are fast.
  • When you just want to do a search the MATCH method is the way to go.
  • When you want to do a search combined with processing the data the Array method is often more efficient. (See match-vs-find-vs-variant-array-vba-performance-shootout  )
  • For small amounts of data either method is good because the timing differences will not be noticeable unless you are iterating the method thousands of times.
  • There does not seem to be a major difference in these tests between Excel 2010 and Excel 2013.

Excel Modeling World Championships 2014

$
0
0

Excel Financial Modelling is not my thing – but if its yours you may be interested in this:

Invitation to the Excel Modeling World Championships 2014

ModelOff 2014Members of the ​FastExcel community are invited to the annual Excel and Financial Modeling World Championship 2014 Event (ModelOff, www.modeloff.com). The Advanced Excel educational competition helps celebrate Excel in Financial Services. The fun, challenging and innovative competition has a mission to inspire skill development with Microsoft Excel, Financial Modeling and Financial Analysis which is central to global businesses and communities. The competition showcases some of the fastest, hard-working and talented Excel minds from 100+ countries. Round 1 starts on 25th October 2014 (8 weeks away)

Summary of Event

Over 3,000 participants competed in the ModelOff 2013 event. Major Global Partners and Sponsors are: Microsoft, Intralinks, S&P Capital IQ, Kaplan Education, Bloomberg, AMT Training, Corality and Ernst & Young.  Participants come from diverse companies and jobs – such as Analysts, Associates and Managers at Investment Firms and Accounting Firms, CFOs, Analytics Professionals, In-House Excel Gurus and Consultants with a shared passion for Microsoft Excel and Finance.  Students comprise ~35% of all entrants worldwide – most studying Commerce, Accounting, Finance and Masters university qualifications. The countries most represented have typically been: United States, UK, Poland, Russia, Canada, Australia, India and Hong Kong.  Female participation is ~20% of all competitors – hopefully higher this year (the reigning champion is Hilary Smart 26yo from London).

How It Works

The ModelOff competition involves two online qualification rounds (2 hours each) conducted simultaneously around the world. The Top 16 performers are flown to New York for the Live Finals at the offices of Microsoft and Bloomberg in early December 2014.  Questions are mostly case study and multiple choice format – with some ranging from a basic understanding of discounted cash flow (DCF) analysis, 3-way integrated cash flow models to more complex project finance and simulation techniques.  Some basic Accounting, Finance and Excel knowledge is likely needed to progress to Round 2.

Free Training and Preparation

All past questions and tests from ModelOff 2012 and 2013 are free and available on the ModelOff website. The organizers believe in accessible excellence and this can be a great starting point for anyone looking to become involved for the first time and improve their Advanced Excel skills in Financial Services and Financial Modeling.  We actively encourage all participants to visit all our community partners, bloggers, our major global sponsors during and following the event for their own learning, mentoring and professional development. Competitors in the Top 10% of ModelOff 2014 will be eligible for exciting local and international opportunities, offers from community partners and fun learning experiences (e.g. Trips to Microsoft Excel in Redmond). We’re also hosting free networking events in major Financial Centres including Hong Kong, London, Sydney, New York and Regional Meetups in the coming months for anyone interested in networking and mentoring with Excel-users in their local cities. Entry to the competition is $20 for students and $30 for professionals.

 



Getting Used Range in an XLL UDF: Multi-threading and COM

$
0
0

In two previous blog posts I discussed why handling whole-column references efficiently in VBA UDFs meant that you had to find the used range for the worksheet containing the whole-column reference. The posts also discussed how using a cache for the used ranges could give a significant performance improvement.

Full Column References in UDFs
Getting used Range efficiently with a cache

But how do you do this for multi-threaded C++ XLLs?

The XLL API cannot directly get the Used Range

The first problem is that the XLL API is based on the old XLM language, and that does not have a method for finding the used range from a worksheet function.

So you have to make a callback from the XLL into Excel to the COM interface, and that can very easily go wrong. Excel does not generally expect call-backs of this type, so the call-back may be ignored, or access the wrong part of memory or even crash Excel.

And with multi-threaded UDFs you certainly only want one UDF calling back at a time!

When you combine this with the need to access and maintain a cache for efficiency reasons the logic looks something like this:

  • For each reference referring to a large number of rows (I am currently using 100000 rows as large)
    • nRows=number of rows referenced
    • Try to get a shared read lock on the cache: if fails exit
    • If there is anything in the cache for this worksheet then nRows=MIN(cached rows, nRows)
    • Else try for an exclusive lock on the cache (write lock): if fails exit
      • Try to callback to Excel COM to get the used range rows.
      • If succeeded store in the cache for this worksheet & nRows=MIN(cached rows, nRows)
      • Unlock the cache
    • If failed to get a lock or COM failed then exit
  • Next large reference

Note that the logic is fail-safe: if the UDF cannot get the lock it needs on the cache or the COM call does not succeed it just uses the number of rows in the reference.
This worked well most of the time but in some situations it always failed.

Excel COM callbacks must be on the main thread.

Trying to figure out exactly what the problem was was tricky, but eventually Govert van Drimmelen, the author of the wonderful Excel DNA, pointed out that calls to COM have to be executed on the main thread.

So I grab the thread ID of the main thread in the XLL On Open event, and then only call the exclusive lock and COM if the UDF is being executed on the main thread.

And it works: thanks Govert!


Extracting Digits from Text: Using Formulas and Designing a Missing Excel Function: GROUPS

$
0
0

An Excel problem that crops up quite often is how to extract digits (0-9) from text. The text might be part numbers, or web addresses, or currency values or …

Some cases are easy to handle with formulas:

  • A fixed number of digits at the start or end of the text-string (Use LEFT or RIGHT)
  • A fixed number of digits starting at a fixed point within the string (Use MID)
  • Groups of n digits separated by a single separator character (Use MID)

But in real life things are often not so simple:

  • No fixed position for the start
  • A variable number of digits
  • A variable number of separators
  • Extract all the digits or only one group
  • Need to locate a particular text string as separator
  • Extract the nth group of digits
  • Work from left to right or right to left
  • Extract only the first or the last n digits from a group

Most of these more complex cases can still be solved using formulas, but the required formulas are often long, complicated, hard to understand and do not adapt well to changes in the data.

Some Formula Examples

The first example is from an article by MVP Ashish Mathur

The Data and results look like this:

Digits1

Ashish’s formula, entered using Control/Shift/Enter as an array formula, looks like this (see his article for an explanation).{=MID(A2,MATCH(TRUE,ISNUMBER(1*MID(A2,ROW($1:$9),1)),0), COUNT(1*MID(A2,ROW($1:$9),1)))}

The next example is from Chandoo

Digits2

This one is probably impossible for all the entries just using a formula, but here is Chandoo’s best attempt.{=MID(B4,MIN(IFERROR(FIND(lstNumbers,B4),””)), SUMPRODUCT(COUNTIF(lstDigits,MID(B4,ROW($A$1:$A$200),1))))+0}

Where lstNumbers is a range containing the digits 0-9 and listDigits is a range containing 0-9 comma and decimal point.

The final example is from a Bill Jelen/Mike Girvin Dueling Excel Podcast, and its really tricky. Some example data looks like this:

Digits4The challenge is to extract the last 3 of the consecutive digits after the first POV_ and before any non-numeric character. As Bill Jelen points out, this is much easier to do with a VBA UDF than with formulas. But if you watch the podcast you can see how Mike Girvin develops some incredible formulas to do the job.

Bill’s VBA UDF looks like this:


Function Nums(MV)
x = Application.WorksheetFunction.Find("POV_", MV) + 4
Nums = ""
For i = x To Len(MV)
ThisChar = Mid(MV, i, 1)
Select Case ThisChar
Case ".", "_"
GoTo FoundIt
Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"
Nums = Nums & ThisChar
End Select
Next i

FoundIt:
Nums = Right(Nums, 3)
End Function

The Missing Excel Function: GROUPS

Well, after watching Bill and Mike solving this really tricky problem I started wondering why does Excel not have a function to do this kind of stuff? And what would it look like if it did have one?

So (of course) I decided to build one! After a few iterations and the inevitable scope creep the requirements looked like this:

  • Extract groups of characters from a text-string
  • Allow the user to define what constitutes a group of characters
  • Extract the Nth group from the start, or the Nth group working backwards from the end
  • Option to specify the maximum number of characters to extract from the front or the back of the group
  • Option to give the start and/or end position within the string for the search for groups.

GROUPS(Text, GroupNumber, MaxChars, GroupType, StartPos, EndPos)

GroupNumber can be zero (all Groups) or a positive or negative number to get the Nth group from the start or end.

MaxChars can be zero (all characters) or a positive or negative number to restrict the number of characters from the start or end of the group.

GroupType can be either of

  • a Regex Pattern string, for example [0-9,.] would define a group as consecutive characters consisting of 0 to 9 comma and decimal point
  • A number from 0 to 4 for the most common group types (0-9 , a-z , not 0-9 , 0-9 and . , not 0-9 and .)

StartPos and EndPos default to the first character (1) and the last character (0)

GROUPS Examples

To get the result from Ashish’s data shown above use

=GROUPS(A2)

The defaults are: get the first group of numbers starting at the left.

For Chandoo’s example you need to define the group of digits as being 0-9 comma and decimal point, and you could append the other characters as groups of everything except 0-9 comma and decimal point and space. A bit of experimentation shows that using -1 as the group number gives better results for this data!

=GROUPS(B4,-1,,”[0-9,.]“) & ” ” & GROUPS(B4,-1,,”[^0-9,. ]“)

Digits5Which is pretty good apart from maybe from having to choose between the INR18lacs or USD$36000!

For Bill Jelen and Mike Girvin’s podcast problem there are a couple of approaches:
a slight cheat notices that the numbers we want are always the second group of numbers so this works: find the last 3 digits from the second group of digits.

=GROUPS(A2,2,-3)

But for the original problem as stated we need to look for POV_

=GROUPS(A2,1,-3,0,SEARCH(“Pov_”,A2)+4)

Find the last 3 digits from first group of digits found starting after POV_.
I am using SEARCH rather than FIND because SEARCH is not case-sensitive.

Conclusion

  • It would be much easier to solve these kind of problems if Excel had a function like GROUPS.
  • You can try out the GROUPS function for yourself by downloading the 15-day trial of FastExcel Version 3 from my website.
  • The GROUPS function is implemented as a multi-threaded XLL function, so performance is quite good.

If you have a real-life extraction problem that cannot be solved by the GROUPS function please let me know!


Excel 2007 multi-threading bug with Range.SubTotal and XLL

$
0
0

OK: this is a fairly obscure Excel 2007 only bug but I thought I should place it on record.

The conditions required for the bug are:

  • Excel 2007 (tested with 12.0.6683.5002 SP3)
  • Automatic calculation Mode
  • Multi-threaded calculation is on and the system has multiple cores
  • XLL multi-threaded worksheet functions are being used
  • A VBA routine uses Range.SubTotal to create subtotals

The bug symptoms are

  • Either Range.subtotal fails and Excel becomes unstable
  • Or Range.subtotal seems to work but Excel becomes unstable

This bug seems to have been fixed in Excel 2010 and later versions.

Bypassing the bug

If your VBA switches off Multi-threaded calculation (Application.MultiThreadedCalculation.Enabled = False) just before doing the Range.SubTotal and then switches it back on again the bug is bypassed.

Whats causing the problem

Using Range.subtotal is triggering a calculation event. VBA always runs on the main thread, but the XLL multi-threaded functions can run on any thread.

So presumably the problem happens when the XLL function is not calculated on the main thread and tries to return a result to Excel, but Excel is not ready to accept it, thus a portion of memory gets overwritten.


Automatic or Manual Calculation?: Grouped Sheets causes Calculation Confusion.

$
0
0

Thanks to Simon Hurst and Paul Wakefield for telling me about this calculation weirdo, and credit to Chatman at Accounting Web for discovering it.

Grouped Sheets and Calculation Mode

Try these experiments:

Experiment 1

  1. Start Excel
  2. Open a workbook in automatic calculation mode and with only 1 sheet selected (so there are no grouped sheets).
  3. Add  =NOW() to a couple of sheets and format the cells to show seconds so that you can see when it calculates.
  4. Check that Calculation shows Automatic both in File-> Options->Formulas and in the Formulas Tab->Calculation Options.
  5. Set Calculation to Manual (does not matter how you do this)
  6. Select 2 or more Sheets (hold down Control and select 2 sheet tabs) so that the sheets are grouped (The Workbook title bar should show [Grouped] ).
  7. Calculation is still Manual
  8. Set the Calculation to automatic using File->Options->Formulas
  9. Notice that the workbook calculates, but Calculation immediately returns to manual (check both Formulas Tab->Calculation Options and File->Options->Formulas)
  10. Now set the Calculation to automatic using Formulas Tab->Calculation Options
  11. Now the workbook calculates and stays in Automatic Calculation Mode, but File->Options->Formulas says its Manual and Formulas Tab->Calculation Options correctly says its Automatic!
  12. Now ungroup the sheets and everything works normally

Experiment 2

  1. Start Excel
  2. Open a workbook in automatic calculation mode and with only 1 sheet selected (so there are no grouped sheets).
  3. Add  =NOW() to a couple of sheets and format the cells to show seconds so that you can see when it calculates.
  4. Check that Calculation shows Automatic both in File-> Options->Formulas and in the Formulas Tab->Calculation Options.
  5. Select 2 or more Sheets (hold down Control and select 2 sheet tabs) so that the sheets are grouped (The Workbook title bar should show [Grouped] ).
  6. Excel should still be in Automatic calculation mode, but File->Options->Formulas shows Manual and Formulas->Calculation Options shows Automatic.
  7. Calculation is actually Automatic.

Confused? Well so is Excel!

Conclusions

  • Looks like it is safer to use Formulas->Calculation Options
  • If you are using Grouped Sheets be careful about your Calculation Mode!

Timing the Ins and Outs of User Defined Functions: Multi-Cell array formulas can be slow

$
0
0

I was looking at some multi-cell array formula UDFs with John and Rich and could not understand why they seemed so much slower than I expected.

Each UDF (written as C++ XLLs) read data from around 200 cells using around 40 parameters, did some very simple logic and arithmetic and then returned values to a different 200 cells. Each of these UDF was taking around 16-20 milliseconds and the workbook contained several thousand calls to these UDFs.

So I started to investigate:

First Hypothesis: Maybe marshalling input data from multiple parameters is slow?

We know that the way to speed up reading data from Excel Cells is to read data in as large a block as possible rather than in many small chunks. So maybe the number of parameters was the problem. To test this I wrote a couple of XLL functions.
MarshallAll takes a single parameter of a range of cells and returns the number of columns. The parameter is type Q so arrives in the function as values rather than a reference.
MarshallMany takes 40 parameters (also type Q) and returns a constant value of 88.


CXlOper* MarshallAll_Impl(CXlOper& xloResult, const CXlOper* Arg1)
{
long nCols=0;
nCols=Arg1->GetWidth2();
xloResult=(double)nCols;
return xloResult.Ret();
}

CXlOper* MarshallMany_Impl(CXlOper& xloResult, const CXlOper* Arg1, const
CXlOper* Arg2, const CXlOper* Arg3, const CXlOper* Arg4, const CXlOper*
Arg5, const CXlOper* Arg6, const CXlOper* Arg7, const CXlOper* Arg8, const
CXlOper* Arg9, const CXlOper* Arg10, const CXlOper* Arg11, const CXlOper*
Arg12, const CXlOper* Arg13, const CXlOper* Arg14, const CXlOper* Arg15,
const CXlOper* Arg16, const CXlOper* Arg17, const CXlOper* Arg18, const
CXlOper* Arg19, const CXlOper* Arg20, const CXlOper* Arg21, const CXlOper*
Arg22, const CXlOper* Arg23, const CXlOper* Arg24, const CXlOper* Arg25,
const CXlOper* Arg26, const CXlOper* Arg27, const CXlOper* Arg28, const
CXlOper* Arg29, const CXlOper* Arg30, const CXlOper* Arg31, const CXlOper*
Arg32, const CXlOper* Arg33, const CXlOper* Arg34, const CXlOper* Arg35,
const CXlOper* Arg36, const CXlOper* Arg37, const CXlOper* Arg38, const
CXlOper* Arg39, const CXlOper* Arg40)
{
xloResult=88.0;
return xloResult.Ret();
}

But when I compared the execution times of these functions they were both fast and there was not much difference in timings, although it was faster to read as many cells as possible with each parameter.

So hypothesis 1 failed.

I checked how the time taken varied with the number of cells read and their datatype.

Marshall3 Marshall4

As expected large strings take longer than small strings which take more time than numbers. And it is more efficient to read as many cells as possible for each parameter.

Second Hypothesis: Maybe returning results to multiple cells is slow?

We know that writing data back to Excel cells from VBA is significantly slower than reading data from cells.

(see VBA read-write speed and Getting cell data with VBA and C++)

I wrote another simple XLL UDF: MarshallOut. This took a single argument of a range and returned it.


CXlOper* MarshallOut_Impl(CXlOper& xloResult, const CXlOper* Arg1)
{
xloResult=*Arg1;
return xloResult.Ret();
}

Bingo: returning data to multiple cells is comparatively slow.

I used FastExcel Profile Formulas to time the tests:

Marshall6Reading and returning a 255 character string to each of 100 cells takes 13 milliseconds.

Marshall1Marshall2Notice that the time taken is NOT linear with respect to the number of cells.

Multi-Threading Effects

I also noticed that FastExcel Workbook Profiler showed that these functions were making inefficient use of multi-threaded recalculation. Presumably this is because they need an exclusive lock on Excel’s results table whilst writing out the results, and most of the time used is doing exactly that.

By contrast, the first set of “read-many cells but write one cell”  functions made efficient use of multi-threading.

Comparing XLL functions with VBA Functions.

I did a small number of comparison with VBA equivalents of these XLL functions. The VBA functions showed the same kind of performance behaviour as the XLL functions and were slightly slower. Of course VBA functions cannot multi-thread.

Conclusions

  • Large numbers of array formulas that return results to multiple cells can be slow.
  • Multi-cell array formulas do not multi-thread efficiently whilst writing their results back.
  • It is more efficient to have larger ranges and fewer parameters than many parameters referencing small ranges.

 


Viewing all 94 articles
Browse latest View live