The back-and-forth with ChatGPT can feel primitive and leaves room for improvement. Engineers using built-in AI agents like Copilot, Cline, and Cursor might never want to return to rudimentary copy-pasting.
This isn’t limited to coding, but applies to inquiries in general. Whether asking about unfamiliar terms or getting ten ideas for brainstorming, being able to invoke these directly from an editor or IDE is obviously better.
It’s crucial to make things as seamless as possible. Everyday convenience and magical innovation lie in efficiency, with seamlessness as a cornerstone. Minimizing hassle and ensuring smooth usability is one of the essential pursuits in engineering.
As a Knowledge Architect, I often engage in divergent and creative work. Thus, being able to seamlessly call upon Generative AI is directly tied to productivity.
I focused on the concept of parallelism. In other words, making inquiries to n models simultaneously.
Let’s use OpenAI API as an example.
Here are some models:
Each model has slightly different characteristics. It’s as if they are people. Let’s just consider them as such.
In this scenario, it’s like having seven different people.
As a side note, you can check the OpenAI API models available to you here.
Think of asking these seven people simultaneously.
Here’s a sample code from a small tool called pgpt.py.
def request_to_model(model_name, prompt, timeout=130):
try:
response = client.chat.completions.create(
model=model_name,
messages=[
{'role': 'user', 'content': prompt},
],
timeout=timeout
)
return response.choices[0].message.content
except Exception as e:
return f"[ERROR in {model_name}]: {str(e)}"
def main():
with concurrent.futures.ThreadPoolExecutor() as executor:
future_to_model = {
executor.submit(request_to_model, model, prompt): model
for model in MODELS
}
for future in concurrent.futures.as_completed(future_to_model):
model = future_to_model[future]
content = future.result()
save_response(args.input, model, content)
Using Python’s concurrent processing, requests are thrown simultaneously. Responses from the completed models (people) are saved as they arrive.
This is simple: just open all the files where the results from seven people are output.
In pgpt.py, results are saved with filenames like p-gpt-4.1.md, p-gpt-4o.md, and so on. Open all of these in VSCode. Any other editor will work too.
Modern editors have a feature to auto-reload when there’s an external update, which means this setup is enough. You get to experience a simultaneous inquiry result handled in parallel and displayed automatically. This allows me to immediately inquire pgpt with shortcut keys in the editor and navigate through the results from seven people instantly. It’s a far better experience than switching back and forth between browser tabs with ChatGPT.
Here are my impressions of each individual.
By making requests to n models in parallel, you can swiftly gather opinions from n people — achieving such an experience.
As a Knowledge Architect, I usually delve into the realm of concept creation, but I do implement these small tools. As an engineer, I’m not very skilled, so take my implementation as a mere reference.
More importantly, consider the mindset of “treating models as people and asking n people simultaneously”. If you’re more adept than I am, you’d likely implement something even more convenient. Feel the power of the experience of asking n people at once!