APPLICATION NOTE 7417

Developing Power-optimized Applications on the MAX78000

By: Afshin Niktash

Abstract:

: Power consumption is a key factor for edge Artificial Intelligence (AI) applications where the entire system is powered by small battery cells and is expected to operate for months without recharging or replacing the batteries. The MAX78000 ultra-low power AI microcontroller is built to target such applications at the edge of the IoT. In this document, various configurations are described to enable users to develop power-optimized applications on the MAX78000, along with benchmarking examples. The power optimization methods are applied to two case study applications—Keyword Spotting with 20 keywords (KWS20) and Face Identification (FaceID), and the reported results can be used as a guideline for the user’s application.


Introduction

The MAX78000 is an ultra-low power microcontroller with a dedicated Convolutional Neural Network (CNN) accelerator. This architecture enables the development of very power efficient AI applications in energy constrained environments. The MAX78000 provides a variety of options to facilitate developing low power applications (elaborated in datasheet [1]), such as different oscillators, clock sources, and operation modes that can be selected based on a target application. The following sections contain an overview of such options with practical examples and actual measurements on the MAX78000 EV kit. For more information and details, follow the referenced materials throughout this document.

Note: The measurement results presented in this application note are based on running example codes on a MAX78000 EV kit. The actual result on the user's platform and specific applications can vary.

MAX78000

The MAX78000 is a new generation of AI microcontrollers built to enable the execution of neural networks at ultra-low power and live at the edge of the IoT. This product combines the most energy-efficient AI processing with Maxim Integrated's proven ultra-low power microcontrollers. The hardware-based CNN accelerator enables battery-powered applications to execute AI inferences while spending only microjoules of energy. The MAX78000 features an Atarm® Cortex®-M4 with FPU microcontroller for efficient system control with an ultra-low power deep neural network accelerator. A RISC-V core is also integrated and can execute application and control codes as well as drive the CNN accelerator. Figure 1 shows the top-level architecture of the MAX78000.

The architecture of the MAX78000Figure 1. The architecture of the MAX78000

The MAX78000 evaluation kit (EV kit) provides a platform to leverage the capabilities of the MAX78000 to build new generations of AI devices. The EV kit features onboard hardware like a digital microphone, serial ports, camera module support, and a 3.5in touch-enabled color thin-film transistor (TFT) display [2]. It also includes the circuitry to monitor and display the power level. The MAX34417 [3] monitors VCOREA and VCOREB voltage as well as the current of the

The Power MonitorFigure 2. The Power Monitor

MAX78000, and reports the accumulated power to the MAX32625 which is used as the power data processor that also controls the power display, as shown in Figure 2.

The power monitor operates in two modes; instantaneous, which displays average power, supply current, and voltage and Windowed Energy Accumulation, which measures the CNN power or total System Power based on triggering events. The instantaneous measurements are typically used for quick, coarse estimates of the power level. On the other hand, the windowed measurements are ideal to measure the accumulated energy between the user defined start and complete instances. These events are triggered by the toggling of two GPIOs of the MAX78000 (P0.2 and P0.3) which are also connected to LED1 and LED2 as well as ports of the MAX32625 power data processor. For more information on using the power monitor, refer to [4].

MAX78000 Clocks and Operating Modes

The MAX78000 supports different clock sources and low power operating modes that can be jointly configured to achieve optimized power and performance. In this application note, a brief description of each mode is provided to guide the reader to set up the modes according to the desired application. For more detailed descriptions, refer to [5].

Clocks

The MAX78000 includes multiple configurable clocks used by different peripherals. The user can configure the clock sources as needed to select the combination of performance and power efficiency. The selected System Oscillator (SYS_OSC) is the clock source for most internal blocks. The following oscillator sources are available and can be selected as SYS_OSC:

  1. Internal Primary Oscillator (IPO) – 100MHz:
    IPO is the fastest frequency oscillator and draws the most power. When entering Low Power Mode (LPM), this oscillator can be powered down.
  2. Internal Secondary Oscillator (ISO) – 60MHz:
    This is a low power internal secondary oscillator that is the Power-On Reset default for SYS_OSC.
  3. Internal Nano-Ring Oscillator (INRO) – 8kHz-30kHz:
    INRO is an ultra-low power internal oscillator that can be selected as the SYS_OSC and is always enabled. The frequency is configurable to 8kHz, 16kHz, or 30kHz (default).
  4. Internal Baud Rate Oscillator (IBRO) – 7.3728MHz:
    This is a very low power internal oscillator that can be selected as SYS_OSC. This clock can optionally be used as a dedicated baud rate clock for the UARTs.
  5. External Real-Time Clock Oscillator (ERTCO) – 32.768kHz:
    ERTCO is an extremely low power internal oscillator that can be selected as the SYS_OSC. The ERTCO can optionally use a 32.768kHz input clock or an 8kHz independent nano-ring oscillator instead of an external crystal. This oscillator is the default clock for the Real-Time Clock (RTC) and is enabled automatically as soon as the RTC is enabled.

The selected SYS_OSC is the input to the system oscillator divider to generate the System Clock (SYS_CLK). The divider is selectable from 0 to 7, resulting in divisors of 1, 2, 4, 8, 16, 32, 64, or 128. SYS_CLK drives the Arm Cortex-M4 with FPU core, and all Advanced High-Performance Bus (AHB) masters in the system. After passing through a fixed divide by two divider, it also can generate the Advanced Peripheral Bus (APB) clock feeding the CNN accelerator, RISC-V, and peripherals. For more detailed information on the clock sources, configurations, and the list of registers to set, refer to section 3.1 of [5].

The API to setup clock is listed in Table 1.

Table 1: API to Setup Clock

API Description
int MXC_SYS_ClockSourceEnable (mxc_sys_system_clock_t clock) Enables a clock source without selecting it; the following sources are available:
MXC_SYS_CLOCK_IPO, MXC_SYS_CLOCK_IBRO,  MXC_SYS_CLOCK_ISO, MXC_SYS_CLOCK_INRO, MXC_SYS_CLOCK_ERTCO,  MXC_SYS_CLOCK_EXTCLK
int MXC_SYS_Clock_Select (mxc_sys_system_clock_t clock) Selects the clock source to be used for SYS_CLK; see the Clocks section for the list
void SystemCoreClockUpdate(void) Update the system core clock according to the selected clock source
void MXC_SYS_ClockEnable (mxc_sys_periph_clock_t clock) Enables peripheral clocks:
MXC_SYS_PERIPH_CLOCK_GPIO0 to 2
MXC_SYS_PERIPH_CLOCK_DMA     
MXC_SYS_PERIPH_CLOCK_SPI0 and 1    
MXC_SYS_PERIPH_CLOCK_UART0 to 3
MXC_SYS_PERIPH_CLOCK_I2C0 to 2
MXC_SYS_PERIPH_CLOCK_TMR0 to 5
MXC_SYS_PERIPH_CLOCK_ADC     
MXC_SYS_PERIPH_CLOCK_CNN     
MXC_SYS_PERIPH_CLOCK_PT      
MXC_SYS_PERIPH_CLOCK_TRNG    
MXC_SYS_PERIPH_CLOCK_SMPHR   
MXC_SYS_PERIPH_CLOCK_OWIRE   
MXC_SYS_PERIPH_CLOCK_CRC     
MXC_SYS_PERIPH_CLOCK_AES     
MXC_SYS_PERIPH_CLOCK_I2S     
MXC_SYS_PERIPH_CLOCK_PCIF    
MXC_SYS_PERIPH_CLOCK_WDT0    
MXC_SYS_PERIPH_CLOCK_CPU1    // RISC-V clock
MXC_SYS_PERIPH_CLOCK_WDT1    
MXC_SYS_PERIPH_CLOCK_LPCOMP1 to 3 
void MXC_SYS_ClockDisable (mxc_sys_periph_clock_t clock) Disable peripheral clocks; see the Clocks section for the list

Operating Modes

The MAX78000 includes multiple operating modes and the ability to fine tune power options to optimize performance and power. The system supports the following operating modes, as summarized in Table 2.

Table 2: Operation Modes of the MAX78000

Operation Mode Arm RISC-V Oscillators System CNN Quadrants CNN RAM Peripherals
ACTIVE On On All available Available Active, Configurable Active, Configurable Available
SLEEP Retention On/ Retention All available Available Active, Configurable Active, Configurable Available
LPM Retention On/ Retention ISO, IBRO,
ERTCO, INRO
0,1: Retention
2,3: Available
Active, Configurable Active, Configurable Available
UPM Retention Retention IBRO,
ERTCO, INRO
Retention Optionally off Selectable retention Retention,
LPUART0, LPTMR0-1 LPWDT0, LPCOMP0-3,
GPIO, WUT, RTC available
STANDBY Retention Retention IBRO,
ERTCO, INRO
Retention Off Selectable retention Retention,
WUT, RTC, COMP0, GPIO available
BACKUP Off Off ERCO Configurable
retention
Off Selectable retention Off
WUT, RTC, COMP0, GPIO available
POWER
DOWN
Off Off Off Off Off Off Off,
Configurable P3.0/1

  1. ACTIVE Mode:

    In this mode, both the Arm and the RISC-V cores can execute the application code and all digital and analog peripherals are available on demand. Dynamic clocking disables peripherals that are not in use, providing the optimal mix of high performance and low power consumption. Each of the peripherals can be individually enabled during active mode or powered down. The CNN and each of the four CNNx16_n Processor Arrays and their associated memories can be powered down or set to active as needed.

  2. SLEEP Mode:

    This mode can be used when either Arm or RISC-V executes the code, during which the other goes to sleep to reduce the power. The clocks can be optionally enabled to speed up the wakeup process. The CNN is available for use and each of the four CNNx16_n Quadrants are individually configurable for power down. All peripherals are on unless explicitly disabled

    To enter SLEEP Mode:
         SCB->SCR &= ~SCB_SCR_SLEEPDEEP_Msk;  // Needed just once
    
    Either directly set sleep mode bit in MXC_GCR->pm register, or use WFI instruction:
    •	MXC_GCR->pm &= ~MXC_F_GCR_PM_MODE;
    MXC_GCR->pm   |= MXC_S_GCR_PM_MODE_CM4;
    •	Or use WFI:
    o	Arm:  __WFI(); // Enter to sleep and wait for interrupt to wake up
    o	RISC-V: asm volatile("wfi");
    
    To enter into this mode on Arm, MXC_LP_EnterSleepMode() API call can also be used.
    
  3. Low Power Mode (LPM):

    This mode can be used if RISC-V is expected to collect and move data from enabled peripherals while Arm is in deepsleep. If RISC-V is not used and is in sleep mode, Arm can also process the data and switch to LPM while peripherals or CNN are active and wake up on interrupts. Arm, DMA, and SPI0 are in state retention. The CNN quadrants, memory, and other peripherals are active and configurable. Note that CNN can only wake

    Before going to LPM:
    Make sure the clock source is enabled during LPM in MXC_GCR->pm, for example:
     MXC_GCR->pm &= ~MXC_F_GCR_PM_ISO_PD;  // enable ISO during sleep
    
    Power Sequencer Registers (PWRSEQ) are used to configure clock source and wake up source. The clock source for peripherals and RISC-V needs to be specified. By default, it uses ISO during LPM. If needed, set this field to 1 to use the PCLK (APB Clock = SYS_CLK/2)
    PWRSEQ_LPCN ->lpcn |= MXC_F_PWRSEQ_LPCN_LPMCLKSEL;
    
    The wakeup source of the low power peripheral (e.g GPIO, RISC-V) needs to be enabled. For example:
    MXC_PWRSEQ->lppwen |= MXC_F_PWRSEQ_LPPWEN_CPU1;  // wake up on RISC-V interrupt
    
    To enter LPM:
         SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk;  // Needed just once
         /* Write 1 to clear */
        MXC_PWRSEQ->lpwkst0 = 0xFFFFFFFF;
        MXC_PWRSEQ->lpwkst1 = 0xFFFFFFFF;
        MXC_PWRSEQ->lppwst  = 0xFFFFFFFF; 
        MXC_MCR->ctrl |= MXC_F_MCR_CTRL_ERTCO_EN; // enable ERTCO 
    
    Either directly set sleep mode bit in MXC_GCR->pm register, or use WFI instruction:
    • MXC_GCR->pm &= ~MXC_F_GCR_PM_MODE; MXC_GCR->pm |= MXC_S_GCR_PM_MODE_LPM;
    • Or use WFI:
      • Arm:__WFI(); // Enter to deepsleep, wait for interrupt to wake up

    up RISC-V but not Arm in LPM. In this case, the Wake Up Timer (WUT) can be used to wake up Arm if the CNN processing time is known. Alternatively, the CNN interrupt can wake up RISC-V, which in turn can wake up Arm and then RISC-V can go back to sleep.

    In LPM, IPO is powered down and IBRO, ERTCO, INRO, or ISO should be used.

    Note: If the Arm core goes to LPM, RISC-V can continue execution and go to sleep as needed. If the entry into LPM needs to be managed between two cores, Multiprocessor Communications can be used to ensure both are in a known state. Refer to section 3.3.3 in [5] for more information.

  4. Micro Power Mode (UPM):

    This mode is used for extremely low power consumption while using a minimal set of peripherals to provide wakeup capability. In this mode, the state of Arm and RISC-V, system, and CNN RAM as well as non-UPM peripherals are retained. The CNN quadrants can be powered off. IPO and ISO oscillators are powered down and IBRO, ERTCO, and INRO are enabled. The following peripherals can be used as source of wake up: WUT, LPUART0, LPTMR0, LPTMR1, LPWDT0, LPCOMP0-LPCOMP3, and GPIO

    To enter UPM Mode:
          /* Write 1 to clear */
         MXC_PWRSEQ->lpwkst0 = 0xFFFFFFFF;
         MXC_PWRSEQ->lpwkst1 = 0xFFFFFFFF;
         MXC_PWRSEQ->lppwst  = 0xFFFFFFFF; 
         MXC_MCR->ctrl |= MXC_F_MCR_CTRL_ERTCO_EN; // enable ERTCO 
    
    Set UPM mode bit in MXC_GCR->pm register:
          MXC_GCR->pm &= ~MXC_F_GCR_PM_MODE;
    MXC_GCR->pm |= MXC_S_GCR_PM_MODE_UPM;
    
    To enter into this mode on Arm, MXC_LP_EnterDeepSleepMode() API call can also be used.
    

    This mode is most beneficial in low activity cycle applications to save energy.

    To ensure the deterministic entry to UPM mode, Arm and RISC-V might need synchronization using Multiprocessor Communication. For more information, refer to section 3.3.4 in [5].

  5. STANDBY Mode:

    In this mode, the system state is retained while RTC is running; Arm, RISC-V, system RAM, GPIO, and peripherals also retain their state. CNN quadrants are off and CNN memory has selectable retention. IPO and ISO oscillators are powered down and ERTCO, INRO, and IBRO are enabled. This mode provides further power reduction if duty cycling of an application is needed.

    To enter STANDBY Mode:
         /* Write 1 to clear */
        MXC_PWRSEQ->lpwkst0 = 0xFFFFFFFF;
        MXC_PWRSEQ->lpwkst1 = 0xFFFFFFFF;
        MXC_PWRSEQ->lppwst = 0xFFFFFFFF; 
        MXC_MCR->ctrl |= MXC_F_MCR_CTRL_ERTCO_EN; // enable ERTCO 
    
    Set STANDBY mode bit in MXC_GCR->pm register:
        MXC_GCR->pm &= ~MXC_F_GCR_PM_MODE;
        MXC_GCR->pm |= MXC_S_GCR_PM_MODE_STANDBY;
    
    To enter into this mode on Arm, MXC_LP_EnterStandbyMode() API call can also be used.
    

    To ensure the deterministic entry to STANDBY mode, Arm and RISC-V might need to synchronize using Multiprocessor Communication. For more information, refer to section 3.3.5 in [5].

  6. BACKUP Mode:

    This mode is used to retain System RAM. In this mode, Arm, RISC-V, all peripherals, and all oscillators except ERTCO are powered off. System RAM 0-3 can be independently configured to retain state using PWRSEQ_LPCN registers.

    Entering BACKUP mode does not require synchronization between the Arm and RISC-V, and either core can set GCR_PM.mode to BACKUP, which causes the device to immediately enter BACKUP.

    To enter BACKUP Mode:
         /* Write 1 to clear */
        MXC_PWRSEQ->lpwkst0 = 0xFFFFFFFF;
        MXC_PWRSEQ->lpwkst1 = 0xFFFFFFFF;
        MXC_PWRSEQ->lppwst  = 0xFFFFFFFF; 
    
     Set BACKUP mode bit in MXC_GCR->pm register:
        MXC_GCR->pm &= ~MXC_F_GCR_PM_MODE;
        MXC_GCR->pm |= MXC_S_GCR_PM_MODE_BACKUP
    
    To enter into this mode on Arm, MXC_LP_EnterBackupMode() API call can also be used.
    
  7. Power Down Mode (PDM):

    This mode is used during product level distribution and storage. Arm, RISC-V, memories, peripherals, and oscillators are powered down. There is no data retention in this mode, but values in the flash memory are preserved. In this mode, VREGI POR voltage monitor is operational.

    Exit from PDM is possible with an external reset (RSTN) or a wakeup event on either P3.0 or P3.1 if configured.

    To enter PDM Mode:
     MXC_GCR->pm &= ~MXC_F_GCR_PM_MODE;
     MXC_GCR->pm |= MXC_S_GCR_PM_MODE_POWERDOWN
    
    To enter into this mode on Arm, MXC_LP_ShutDownMode() API call can also be used.
    

Wakeup Configuration

Arm and RISC-V need to be configured to wake up in case of a wakeup event. For the list of wakeup sources in different operating modes, refer to section 3.4 in [5].

Arm Wakeup Configuration:

Interrupts should be enabled. In case of SLEEP, the Arm core wakes up after__WFI() in response to any interrupt.

For a low power mode, the wakeup source needs to be enabled:

  • In Power Sequencer Registers (PWRSEQ): Enable peripherals and specific GPIO source in Low Power Peripheral Wakeup Enable registers (PWRSEQ_LPWKEN0 to 3, PWRSEQ_LPPWEN)
  • In Global Control Registers (GCR): Enable GPIO Wakeup Enable and RTC Alarm Wakeup Enable in Power Management Register (GCR_PM)

Refer to sections 3.11 and 3.13 of [5].

Example of Arm wakeup configuration: 
Wakeup from LPM on GPIO:
 MXC_GCR->pm  |= MXC_F_GCR_PM_GPIO_WE;  // enable wakeup from GPIOs
 MXC_PWRSEQ->lpwken2 |= (1 << 7);   // Pb2: GPIO2.7 is selected for wakeup

Wakeup from LPM on RISC-V interrupt:
 MXC_PWRSEQ->lppwen |= MXC_F_PWRSEQ_LPPWEN_CPU1; // wakeup on RISC-V interrupt

RISC-V Wakeup Configuration:

To configure the RISC-V to wake up on an interrupt, the IRQ as well as the EVENT need to be enabled.

Example of RISC-V wakeup configuration: 
 __enable_irq();
 NVIC_EnableIRQ(CNN_IRQn);
 NVIC_EnableEVENT(CNN_IRQn);

Table 3 shows an example of Arm and RISC-V code. Arm goes to LPM (deepsleep) and RISC-V initializes and starts CNN, and goes to sleep until CNN inference is complete. Afterwards, a CNN interrupt wakes up RISC-V, which then wakes up ARM as well.

For complete code of this example, refer to [6].

Table 3: Example of Arm and RISC-V LPM and Wake Up

Arm RISC-V
// mnist-riscv
int main(void)
{
MXC_ICC_Enable(MXC_ICC0); // Enable cache
// Switch to 100 MHz clock
MXC_SYS_Clock_Select(MXC_SYS_CLOCK_IPO);
SystemCoreClockUpdate();
// Set RISC-V boot address
MXC_FCR->urvbootaddr = (uint32_t) &__FlashStart_;
// Enable Sempahore clock MXC_SYS_ClockEnable(MXC_SYS_PERIPH_CLOCK_SMPHR);

// Set wakeup ISR
NVIC_SetVector(RISCV_IRQn, WakeISR);

 // DO NOT DELETE THIS LINE:
MXC_Delay(SEC(2)); // Let debugger interrupt if needed

// Enable RISC-V clock
MXC_SYS_ClockEnable(MXC_SYS_PERIPH_CLOCK_CPU1);
// Go to LPM, wait for RISC-V to wake up
MXC_LP_ClearWakeStatus();
SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk; // SLEEPDEEP=1
__WFI(); // Go to deepsleep, Let RISC-V run
return 0;
}

void WakeISR(void)
{
MXC_SEMA->irq0 = MXC_F_SEMA_IRQ0_EN & ~MXC_F_SEMA_IRQ0_CM4_IRQ;
}

int main(void)
{
Debug_Init(); // Set up RISCV JTAG
MXC_ICC_Enable(MXC_ICC1); // Enable cache

  // Enable peripheral, CNN interrupt, CNN clock
// CNN clock: 50 MHz div 1
cnn_enable(MXC_S_GCR_PCLKDIV_CNNCLKSEL_PCLK,  MXC_S_GCR_PCLKDIV_CNNCLKDIV_DIV1);

  cnn_init(); // Bring state machine into consistent state
cnn_load_weights(); // Load kernels
cnn_load_bias();
cnn_configure(); // Configure state machine
load_input(); // Load data input

cnn_start(); // Start CNN processing

  while (cnn_time == 0)
     asm volatile("wfi"); // Sleep, Wait for CNN complete

  // Now signal Arm to wake up
  MXC_SEMA->irq0 = MXC_F_SEMA_IRQ0_EN | MXC_F_SEMA_IRQ0_CM4_IRQ;
}

int cnn_enable(uint32_t clock_source, uint32_t clock_divider)
{
// Reset all domains, restore power to CNN
MXC_BBFC->reg3 = 0xf; // Reset
MXC_BBFC->reg1 = 0xf; // Mask memory
MXC_BBFC->reg0 = 0xf; // Power
MXC_BBFC->reg2 = 0x0; // Iso
MXC_BBFC->reg3 = 0x0; // Reset

  MXC_GCR->pclkdiv = (MXC_GCR->pclkdiv & ~(MXC_F_GCR_PCLKDIV_CNNCLKDIV | MXC_F_GCR_PCLKDIV_CNNCLKSEL))
| clock_divider | clock_source;
// Enable CNN clock MXC_SYS_ClockEnable(MXC_SYS_PERIPH_CLOCK_CNN);
// Enable Interrupt and wakeup on CNN complete
  __enable_irq();
  NVIC_EnableIRQ(CNN_IRQn);
  NVIC_EnableEVENT(CNN_IRQn);
return CNN_OK;
}

Power Consumption in Different Modes

In developing an application, the user can switch the MAX78000 to different operation modes and schedule the tasks accordingly to save power. An understanding of power consumption in each operation mode, as well as the time needed to boot up or to wake up, helps the user select the appropriate mode and plan the duty cycle of the tasks. The Power Monitor can be used in System Power mode [4] to measure power consumption and wakeup or bootup time in several cases. In this test, RISC-V and peripherals are off and Arm is tested in different operation modes. For the wakeup time test, RISC-V is used as the wakeup source of Arm. Figure 3 shows the result of this experiment. ACTIVE and SLEEP power were measured using ISO oscillator at 60MHz. Other low power modes employed ERTCO (32kHz) during the sleep period (API default).

Bootup time (defined as the time from power-up to the start of executing the main code) and wakeup time with different clock frequencies are also tested, as reported in Figure 4. Wakeup time is scaled with selected clock frequency. The bootup time is measured with power on default clock (ISO at 60MHz).

Power consumption in different operation modesFigure 3. Power consumption in different operation modes

Bootup and Wakeup time at different frequenciesFigure 4. Bootup and Wakeup time at different frequencies

CNN Power

The core of an AI application is the inference, performed by the CNN accelerator. Depending on the application, the inference can happen continuously on incoming data or periodically at certain time intervals. The CNN inference can start once the input data is ready (e.g. KWS case study), or in FIFO mode (e.g. FaceID example) where it starts as soon as enough data to begin the process is stored in the FIFO. The CNN power consumption is measured in three phases:

  • Loading weights (kernels): Happens once to load weights into CNN memory in active mode
  • Loading input data: Every time there is a new inference. In FIFO mode, it can overlap with inference
  • Inference: Operates on input data and generates the classification result

The Power Monitor uses GPIO signals to trigger each event and measure the power of each phase in CNN Power Mode [4]. To conduct measurements, two examples in the Maxim SDK were selected: KWS20 and FaceID. Each example was trained [7], synthesized [8], and flashed into the MAX78000 EV kit. The synthesis script can be instructed to generate the code for Arm only (default: RISC-V is off), or for RISC-V (with --deepsleep --riscv --riscv-debug flags) where Arm initializes RISC-V and goes to sleep. The synthesis script can also include GPIO signaling to instruct the Power Monitor to run in CNN Power Mode (--energy). The energy, time, inactive (I) and active (A) power measurements are displayed on the Power Monitor display. The results of testing KWS20 and FaceID in different operation modes and clock sources (IPO at 100MHz and ISO at 60MHz) are shown in Table 4 and Table 5. Note that FIFO is used in the case of FaceID, so inference time includes loading time as well. According to the results, the inference time and energy using ARM only as well as ARM and RISC-V are similar when CNN is used with the same clock. Loading weights and data take longer with slower clock sources, causing a significant increase in energy. The user should consider the energy needed to load kernels and inputs to decide how to schedule the application. In cases in which the activity cycle is small, reloading the weight might improve the overall power consumption. The inference time and energy are not expected to improve in SLEEP mode or LPM, as they are measured only during the activity window of CNN. However, they significantly affect the overall power consumption of applications, as shown later.

The case in which both ARM and RISC-V were asleep during idle time was selected to test the effect of oscillator frequency by applying different clock frequencies and sources. RISC-V drives loading weights and inputs, as well as manages CNN. By reducing the clock frequency, the inference time and energy go up exponentially. This suggests that executing the loading and inference at the highest speed and then going to a longer sleep period tends to improve the overall energy consumption. The result of this experiment is shown in Figure 5 and Figure 6.

Table 4: Profiling CNN Power for KWS20

KWS20:
Operation Mode
Clock Frequency (MHz) Kernel Loading Input Loading Inference
Arm RISC-V CNN E(mJ) t(ms) I(mW) A(mW) E(mJ) t(ms) I(mW) A(mW) E(mJ) t(ms) A(mW)
Arm Only

Always ACTIVE

100 N/A 50 0.146 11.6 7.2 19.8 0.013 1.1 7.2 19.27 0.14 2 37.45

SLEEP

100 N/A 50 0.148 11.6 7.02 19.8 0.013 1.1 7.02 19.33 0.136 2 35.82

LPM

60 N/A 30 0.154 19.3 5.33 13.29 0.014 1.8 5.33 13.06 0.139 4.1 18.32
Arm and RISC-V

Always ACTIVE

100 50 50 0.42854 34.72 5.47 17.81 0.04001 3.31 10.45 22.52 0.1401 1.99 21.86

Only Arm SLEEP

100 50 50 0.04036 3.31 5.5 17.71 0.14077 2.01 21.89

Both SLEEP

100 50 50 0.03829 3.15 5.47 17.63 0.14055 2.19 21.41

Both SLEEP

60 30 30 0.487 50.3 3.48 13.16 0.0481 5.53 3.63 12.31 0.14595 3.38 13.06

LPM

60 30 30 0.0456 4.7 3.48 12.92 0.1424 3.5 13.67

Table 5: Profiling CNN Power for FaceID

FaceID: Operation
Mode
Clock Frequency (MHz) Kernel Loading Input Loading FIFO Loading + Inference
Arm RISC-V CNN E(mJ) t(ms) I(mW) A(mW) E(mJ) t(ms) I(mW) A(mW) E(mJ) t(ms) A(mW)
Arm Only Always ACTIVE 100 N/A 50 0.158 12.2 7.03

20.01

0.132 12.1

7.03

17.91 0.435 14 38.19
SLEEP 100 N/A 50 0.156 12.2 7.22

20.01

0.13 12.1

7.22

17.93 0.43 14 38
LPM 60 N/A 30 0.188 20.3 4.6

13.5

0.162 20

4.6

12.61 0.472 24 24.1
Arm and RISCV Always ACTIVE 100 50 50

 0.26

 36

 3.52

10.64

0.30203 24.82

14.54

26.72 0.57958 24.37 38.39
Only Arm SLEEP 100 50 50 0.29813 24.32

5.46

17.47 0.57823 24.87 29.3
Both SLEEP 100 50 50 0.29899 24.82

5.48

17.53 0.57921 24.37 29.18
Both SLEEP 60 30 30 0.53 52 3.45 13.48 0.36954 41.47

3.57

12.49 0.65009 40.72 19.52
LPM 60 30 30 0.328 36.5

3.45

12.45 0.608 36.1 20.36
LPM, Fast FIFO 60 30 30 N/A N/A

N/A

N/A 0.524 25.2 24.29
LPM 60 60 60 N/A N/A N/A

N/A

0.254 18.2

6.02

19.19 0.533 18 35.58
LPM, Fast FIFO 60 60 60 N/A N/A N/A

N/A

N/A N/A

N/A

N/A 0.471 12.6 43.57

The effect of clock frequency on KWS20, Arm in SLEEP and RISC-V drives CNN and sleeps in idle timeFigure 5. The effect of clock frequency on KWS20, Arm in SLEEP and RISC-V drives CNN and sleeps in idle time

The effect of clock frequency on FaceID; Arm in SLEEP; RISC-V drives CNN and sleeps in idle timeFigure 6. The effect of clock frequency on FaceID; Arm in SLEEP; RISC-V drives CNN and sleeps in idle time

Retaining Weights in CNN Memory

Each of the four CNN Mask RAMs (the CNN weights storage) can be configured to retain the weights in case of UPM, BACKUP, or STANDBY. On the other hand, CNN configuration registers are lost in any of those power saving modes (except SLEEP and LPM) regardless of CNN Mask RAM retention setting and need to be reloaded.

For some low duty cycle applications, the user can choose to turn off Mask RAMs to further reduce power consumption during sleep. In this case, the effect of extra power used to repeatedly reload weights should be considered. Table 6 shows the power consumption increase from 0.17mW to 0.38mW during UPM when CNN weights are retained for the FaceID example. In this case, the CNN configuration reloading takes a fraction of a millisecond. However, if CNN Mask RAMs are off, the total reloading time of weights and configuration is greater than 12 milliseconds. The user can study such tradeoffs to optimize the schedule and mode of operations for their applications. The following examples show how to turn on/off weight retention and how to reinitialize CNN after wakeup.

Example: Turn on weight retention 
 MXC_BBFC->reg3 = 0xf; // Reset on
 MXC_BBFC->reg1 = 0x0; // turn off CNN RAM0-3 power
 MXC_BBFC->reg3 = 0x0; // Reset off
Example: Loading weights and configuring CNN with CNN Mask RAMs retention

 // wake up from UPM, BACKUP, or STANDBY – CNN memory was retained
 cnn_init(); // Bring state machine into consistent state
 cnn_load_bias();
 cnn_configure(); // Configure state machine
 …
 // proceed to loading data  
Example: Turn off weight retention prior to sleep (must turn back on after waking up)
 MXC_BBFC->reg3 = 0xf; // Reset on
 MXC_BBFC->reg1 = 0x0; // turn off CNN RAM0-3 power
 MXC_BBFC->reg3 = 0x0; // Reset off
  
 /* get ready to go to sleep */
 MXC_LP_EnterDeepSleepMode(); // Arm enters LPM, waits for wakup

 /* restore power to CNN RAM0-3 after wakeup, or prior to next cnn_load() */
 MXC_BBFC->reg3 = 0xf; // Reset
 MXC_BBFC->reg1 = 0xf; // turn on CNN RAM0-3 power
 MXC_BBFC->reg3 = 0x0; // Reset
Example: Loading weights and configuring CNN without CNN Mask RAMs retention

 // Power up or wake up from UPM, BACKUP or STANDBY, CNN memory not retained
 cnn_init(); // Bring state machine into consistent state
 cnn_load_weights(); // Load kernels
 cnn_load_bias();
 cnn_configure(); // Configure state machine
 …
 // proceed to loading data  

Table 6: FaceID Power Consumption in UPM mode With and Without Retaining Weights

UPM Loading Weights/CNN Config After Waking up from UPM
CNN Weight Retention P(mW) E(µJ) T(ms) P(mW)
Off 0.172 242.38 12.16 19.93
On 0.387 1.956 0.1089 17.95

Application Power

When developing an application, it is important to choose the operation mode of each core, clock frequency, and duty cycle. Typically, applications running on Arm can be executed at a higher frequency (100MHz, vs. 50MHz APB clock) and have more optimized code, which can result in lower overall energy used. If there are parallel tasks that can be assigned to both Arm and RISC-V cores, it is important to use the proper operation mode for each core. In SLEEP and LPM, cores can operate or sleep independently. However, in other power saving modes like UPM or STANDBY, both cores are powered down simultaneously. In this case, Arm and RISC-V might need to coordinate in the case that they both go to low power mode using Multiprocessor Communications as described in section 8.2 of [5].

For applications that require occasional short activity periods, UPM, STANDBY, BACKUP, or SHUTDOWN mode can be an option. The user can evaluate the energy cost of maintaining weights in CNN memory during the sleep periods and compare with the energy of reloading weights prior to each activity window. The Power Monitor circuit on the MAX78000 EV kit can be used to measure the overall system power within a time window [4]. This helps the user to profile the application power at run time as needed.

Case Study 1: Keyword Spotting (KWS20) Demo on Arm and CNN

This example application demonstrates the recognition of a set of 20 keywords using the MAX78000 EV kit [9]. The application reads I2S microphone samples at 16kHz and monitors the threshold on 128 sample windows. Once it goes above the silence threshold, the application captures one second worth of samples, runs CNN inference on the samples, and displays the classification result on the TFT display.

In this benchmarking example, RISC-V is not used and the Arm core handles the entire task. Several modes of operation were tested. The Power Monitor in System Power Mode is used to measure the total energy in ten second intervals, both when there is complete silence (i.e. no inference or TFT update) and when keywords are uttered at a rate of one word per second (ten inferences and TFT updates). The result is shown in Table 7. In ACTIVE mode, the Arm core always operates. The power in this mode is about 20mW, regardless of silence or detection as the inference energy is a fraction of total power consumption. By using LPM during the idle time of collecting I2S samples as well as the inference time, and by turning off the CNN clock after inferences (a major reduction in power), the power goes down to 8.3mW (6.58mW during silence).

Table 7: Power Optimization Steps Applied to KWS20-Demo Application

Energy avg during 10s (silence) Energy avg during 10s (10xwords)
SYS Clock CNN Clock Arm Mode E(mJ) t(s) P(mW) E(mJ) t(s) P(mW)

IPO
100MHz

 50Mhz

ACTIVE

200.9 10.002 20.09 206.2 10.045 20.53

SLEEP during inference

200.4 10.001 20.04 204.3 10.001 20.43
SLEEP between I2S interrupts & during inference 155 9.63 16.1 209.3 12.01 17.4
LPM between I2S interrupts & during inference 146.7 9.63 15.2 211.5 12.35 17.1
LPM between I2S interrupts and during inference, CNN clock off after completion 63.5 9.62 6.58 101.9 12.18 8.3
Same as above, TFT off 66.51 9.63 6.75 67.4 9.64 6.99

Duty cycle and average power of KWS20 example during silence or keyword detection period, TFT can be on or offFigure 7. Duty cycle and average power of KWS20 example during silence or keyword detection period, TFT can be on or off

The average power consumption and activity duty cycle of the last two power optimized operation modes in Table 7 are analyzed and demonstrated in Figure 7.

Case Study 2-A: FaceID Demo on Arm and CNN

This demo application demonstrates the identification of subjects using facial recognition on the MAX78000. The FaceID CNN model generates an embedding of the size 512 from live images captured from the EV kit camera. The distance of the embedding from the embeddings of known subjects is calculated, and if it is greater than a threshold, the best candidate is selected. The database of known embeddings is created from pictures of the subjects and is integrated with the firmware. The inference is executed on a 160x120 face box of each image. To enhance the performance of identification, the inference is executed three times on a frame as well as each time the face box is slightly moved on the picture.

In this case study, RISC-V is off and the entire task is driven by Arm and executed sequentially. The Arm goes to SLEEP mode during inference and UPM or STANDBY between frames for about half a second. Table 8 summarizes the execution time and power of main functions of the FaceID. Table 9 displays the time and power consumption during each operation mode.

Table 8: Power and Execution Time of Different Task in FaceID (Arm and CNN)

Processing One Frame Energy Avg
SYS Clock CNN Clock Arm duty cycle E(µJ ) t(ms) P(mW)
ARM-100MHz  50MHz Image capture 1189.8 171.2 6.95
CNN kernel load (for STANDBY or UPM) 935.3 37.5 24.94
3 x CNN run 1164.7 42 27.73
3 x CNN unload 1.6 0.15 10.7
3x Distance calculation 470.4 48 9.8
Display image on TFT 620.5 49.4 12.56
Total 4382.2 348.25

Table 9: Power and Duration of Each Operation Mode in FaceID (Arm and CNN) for One Frame

Mode Active UPM Sleep Total
Time (ms) 227 498 160 885
Power (mW) 18.14 0.3 6.92
Energy (µJ) 4118 150 1112 5380
Mode Active Standby Sleep Total
Time (ms) 227 497 160 884
Power (mW) 18.14 0.03 6.92
Energy (µJ) 4118 15 1112 5245

Duty cycle and average power of FaceID example running on ArmFigure 8: Duty cycle and average power of FaceID example running on Arm

Case Study 2-B: FaceID Demo on Arm, RISC-V, and CNN

In this scenario, Arm handles the TFT control and display (only accessible through Arm on MAX78000 EV kit) and RISC-V drives image capture from the camera, CNN loading and inference, and distance calculation of the embedding against the database. Shared memory and mailboxes are employed to coordinate the Arm and RISC-V communication.

Table 10 summarizes the execution time and power during major functions of the FaceID in this mode. After processing a frame, Arm and RISC-V go to STANDBY mode for about half a second like the previous case.

Table 10: Power and Execution Time of Different Task in FaceID (Arm, RISC-V, and CNN)

Processing One Frame Energy Avg
SYS Clock CNN Clock RISC-Arm E(µJ) t(ms) P(mW)

ARM-100MHz RISCV-60MHz

 50MHz

Image capture 1392.6 169 8.24
CNN kernel load 818.3 35.5 23.05
3 x CNN run 7910.1 327 24.19
3 x CNN unload 9.5 0.9 10.5
3x Distance calculation 699.6 66 10.6
Display image on TFT (on Arm in parallel with RISC-V processing) 60 12.48
Total 10830 658.9


Table 11: Power and Duration of Each Operation Mode in FaceID (Arm, RISC-V, and CNN) for One Frame

Mode Active Standby Sleep Total
Time (ms) 448 500 169 1117
Power (mW) 18.14 0.05 7.89
Energy (µJ) 8126.72 25 1333.41 9485.13

Duty cycle and average power of FaceID example running on Arm and RISC-VFigure 9. Duty cycle and average power of FaceID example running on Arm and RISC-V

Comparison of using Arm, or both Arm and RISC-V in implementation of FaceIDFigure 10. Comparison of using Arm, or both Arm and RISC-V in implementation of FaceID

By comparing the two implementations of FaceID in Figure 10, it is clear that the total energy to process a frame is significantly lower by using Arm, which is primarily due to its higher speed as opposed to RISC-V (100MHz, vs 50MHz) resulting in a shorter duration of the ACTIVE state.

Summary

This application note covers an overview of power optimization techniques supported by the MAX78000 and the results when used with Keyword Spotting and FaceID applications. The following summary highlights some general recommendations to improve the power consumption:

  • Faster is generally better. Generally, using higher clock rates makes the execution faster and the impact of constant static power is reduced, improving the overall power consumption.
  • Single core is preferred if there are not many parallel tasks to perform. Running tasks on RISC-V is slower due to its clock rate, and tends to result in an increase in energy usage compared to Arm.
  • Take advantage of UPM, BACKUP, and STANDBY when there is no activity. Consider LPM when Arm can sleep while some peripherals are running.
  • Load kernels (weights) once and retain in memory to avoid reloading energy in UPM, BACKUP, and STANDBY modes. However, if the duty cycle of activity is very low, consider disabling weight retention or using POWERDOWN mode and reloading the weights in every cycle.
  • Make sure to turn off the CNN clock after CNN processing is complete and turn it on before accessing CNN again in the next activity period.
  • Use LPM when CNN is running and Arm is waiting for inference to complete. To wake up, either use RISC-V or the wakeup timer. A CNN interrupt cannot wake up Arm directly in LPM.
  • Using fast FIFO or quad fast FIFO (supported only if RISC-V drives CNN) significantly improves input loading time and inference energy (--fast-fifo, --fast-fifo-quad) [8].

References

[1] MAX78000 Data Sheet
[2] MAX78000 Evaluation Kit Data Sheet
[3] MAX34417 Data Sheet
[4] MAX78000 Power Monitor and Energy Benchmarking Guide
[5] MAX78000 User Guide
[6] https://github.com/MaximIntegratedAI/MAX78000_SDK/tree/master/Examples/MAX78000/CNN/mnist-riscv
[7] https://github.com/MaximIntegratedAI/ai8x-training, please see README for instructions.
[8] https://github.com/MaximIntegratedAI/ai8x-synthesis, please see README for instructions.
[9] https://github.com/MaximIntegratedAI/MAX78000_SDK/tree/master/Examples/MAX78000/CNN/kws20_demo
[10] https://github.com/MaximIntegratedAI/MAX78000_SDK/tree/master/Examples/MAX78000/CNN/faceid_evkit

Trademarks

Arm and Cortex are registered trademarks of Arm Limited