Search⌘ K

The C++ Frontend

Explore PyTorch's C++ frontend to build efficient machine learning models, manage distributed training, and improve reproducibility. Learn how to compile C++ projects, use the torch.distributed module for GPU and CPU training, and leverage Torch Hub for sharing pretrained models and accelerating project development.

Even though the backend of PyTorch is mostly implemented by C++, its frontend API has always been focused on Python. It's partly because Python has already been very popular among data scientists and it has tons of open source packages that help us focus on solving the problems, rather than re-creating the wheel. Also, it's extremely easy to read and write. However, Python is not known for computation and memory resource efficiency. Big companies often develop their own tools in C++ for better performance. But smaller companies or individual developers find it difficult to divert their main focus to developing their own C++ tools. Luckily, PyTorch has now shipped the C++ API. Now, anyone can build efficient projects with it.

Here's an example of how to use the C++ API provided by PyTorch. Let's load the traced model we exported previously:

C++
torch::Device device = torch::kCUDA;
std::shared_ptr<torch::jit::script::Module> module = torch::jit::load("model_jit.pth");
module->to(device);

Then, let's feed a dummy input image to the model:

C++
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({BATCH_SIZE, IMG_CHANNEL, IMG_HEIGHT, IMG_WIDTH}).to(device));
at::Tensor output = module->forward(inputs).toTensor();

The full code for the C++ example is as follows:

C++
#include <torch/script.h> // One-stop header.
#include <iostream>
#include <memory>
using namespace torch;
int main(int argc, const char* argv[]) {
torch::Device device = torch::kCUDA;
// Deserialize the ScriptModule from a file using torch::jit::load().
std::shared_ptr<torch::jit::script::Module> module = torch::jit::load("../model_jit.pth");
module->to(device);
assert(module != nullptr);
std::cout << "model loading ok\n";
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1000, 1, 28, 28}).to(device));
std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
for (int64_t itr = 1; itr <= 10; ++itr) {
at::Tensor output = module->forward(inputs).toTensor();
}
std::chrono::steady_clock::time_point end= std::chrono::steady_clock::now();
std::cout << "Time elapsed = " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << "ms" << std::endl;
return 0;
}

It also requires CMakeLists.txt file for compiling the .cpp file.

C++
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(mnist_jit)
set(Torch_DIR "/home/john/opt/libtorch/share/cmake/Torch")
find_package(Torch REQUIRED)
add_executable(mnist_jit mnist_jit.cpp)
target_link_libraries(mnist_jit "${TORCH_LIBRARIES}")
set_property(TARGET mnist_jit PROPERTY CXX_STANDARD 11)

The redesigned distributed library

Debugging multithreading programs on a CPU is painful. ...